The case of the supersized shebang

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet
February 18, 2019

Regressions are an unavoidable side effect of software development; the kernel is no different in that regard. The 5.0 kernel introduced a change in the handling of the "#!" (or "shebang") lines used to indicate which interpreter should handle an executable text file. The problem has been duly fixed, but the incident shows how easy it can be to introduce unexpected problems and highlights some areas where the kernel's development process does not work as well as we might like.

By longstanding Unix convention, an attempt to execute a file that does not have a recognized binary format will result in that file being passed to an interpreter. By default, the interpreter is a shell, which will interpret the file as a shell script. If, however, the file starts with the characters "#!", the remainder of the first line will be treated as the name of the interpreter to use (and possibly arguments to be passed to that interpreter). This mechanism allows programs written in almost any interpreted language to be executed directly; the user need never know which interpreter is actually doing the work behind the scenes.

[Update: as noted in the comments, the above behavior is the result of both kernel and user-space code; in particular, the default to a shell is implemented within current shells and C libraries.]

The array used to hold the shebang line is defined to be 128 bytes in length. That naturally leads to the question of what happens if the line exceeds that length. In current kernels, the line will simply be truncated to fit the buffer, after which execution proceeds as normal. Or, at least, as normal as can be expected given that part of the shebang line is now missing. Recently, Oleg Nesterov decided that this behavior is wrong; it could cause misinterpreted arguments or, should the truncated line happen to be the valid name of an interpreter executable in it own right, run the wrong interpreter entirely. He put together a patch (merged for 5.0-rc1) changing that behavior; the kernel would fail the attempt to find an alternative interpreter entirely in that situation, causing a fallback to the default shell.

Trouble for NixOS

The NixOS distribution, it seems, takes an unusual approach to the management of scripts. As noted in a problem report posted by Samuel Dionne-Riel on February 13, NixOS scripts can have shebang lines like:

    #! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl
       -I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl
       -I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl
       -I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl
       -I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl

This line has been split for (relative) ease of reading; it is all a single line in the files themselves. This line exceeds the maximum length by a fair amount, triggering the new code. The end result is that the Perl interpreter is not invoked as expected and the attempt to execute the file fails. User-space code reacts by passing the script to a shell, which rather messily fails to do the right thing with it. In other words, a change intended to prevent scripts from being passed to the wrong interpreter caused the system to start passing scripts to the wrong interpreter. The NixOS developers, rightly, saw this change as a regression; something that used to work no longer does with the 5.0 kernel.

One might well wonder just how things worked before, since a truncated version of that shebang line is still wrong. It turns out that the Perl interpreter is able to detect this truncation; it rereads the first line itself and sets its arguments properly. As long as the interpreter itself is the correct one, things will work as expected. As of 5.0-rc1, though, the correct interpreter would no longer be invoked, and things went downhill from there.

The kernel project's policy on this kind of change is clear, but Linus Torvalds reiterated it in this case anyway:

It doesn't matter if it "corrupted" things by truncating it. All that matters is "it used to work, now it doesn't"

Yes, maybe it never *should* have worked. And yes, it's sad that people apparently had cases that depended on this odd behavior, but there we are.

The change has since been reverted, so NixOS will be able to run 5.0 kernels. There is work being done to achieve the original goal (preventing the kernel from possibly running the wrong interpreter) while not breaking existing users; that is proving harder than one might expect and will almost certainly have to wait for 5.1.

Regressions in stable kernels

Had that been the end of the story, it would have been just another case of a regression introduced during the merge window, then corrected during the stabilization period. But, as it happens, this change found its way into the 4.20.8, 4.19.21, 4.14.99, and 4.9.156 stable kernel updates, despite the fact that neither the author nor the maintainer who merged it (Andrew Morton) had marked it for stable backporting. Morton complained, noting that he had concluded that the patch should not be backported, but that backport had happened anyway.

Not that long ago, the lack of an explicit tag would prevent a patch from being backported to the stable releases, but the situation has changed somewhat in recent years. Along with many of the other changes in that set of especially large stable kernel updates, Nesterov's patch had been automatically selected for backporting by Sasha Levin's machine-learning system. Greg Kroah-Hartman suggested that concerned developers and users should have noticed this patch and complained before it was shipped: "This came in through Sasha's tools, which give people a week or so to say 'hey, this isn't a stable patch!' and it seems everyone ignored that". The implication is that, had people been paying attention, this regression would not have found its way into the stable updates.

The patch in question was flagged for backporting as part of a set of 304 selected for 4.20 on January 28. It then found its way into the 4.20.8 review notification on February 11. That stable-release cycle gave developers and users a mere 352 patches to look over, but perhaps some understanding can be extended to those who didn't quite manage to evaluate the whole set in time. In truth, of course, there is little chance that anybody can truly look at that patch volume (multiplied by several major releases receiving stable updates at the same time) and pick out the bad patch. So some developers, such as Michal Hocko, have said (again) that the process of moving patches into stable releases should be slower, perhaps waiting until those patches have appeared in a major release from Torvalds. That is especially true, he said, of the "nice-to-have" patches that don't address problems users are complaining about.

Levin does not think that will help:

The fact is that many patches are not tested until they get to stable, whether we add them the same week they went upstream or months later. This is a great case for this: I doubt anyone but NixOS does this crazy thing with shebang lines, so who else would discover the bug?

As a general rule, that might even be true, but it happens to not be in this case: the NixOS developers discovered the problem on January 8, and filed a report in the kernel bugzilla on February 2. The commit causing the problem had been identified (through bisection) on February 3. Shipping the regression in the stable updates had nothing to do with its discovery and reversion, in other words — the problem had already been identified well before the stable kernels shipped it.

Even so, Levin remains adamant that the process of automatically selecting patches for backporting is the right thing to do:

The approach of manually deciding if a patch needs to go in stable is wrong and it doesn't scale. We need to beef up our testing story and make these decisions based off of that, and not our error-prone brains that introduced these bugs to begin with.

This is undoubtedly an issue that will arise again; there are a great many fixes going into the kernel, and users of stable kernels (almost all of us) benefit from getting those fixes. But there are clearly some things that can be improved here. There was no test for this particular regression because it had never occurred to anybody that things could break in that way; we now know better, but no tests have been added yet. A kernel bugzilla instance that doesn't prevent a known-bad patch from getting into a stable release is clearly not doing its job; the kernel community as a whole lacks a convincing story on how bugs should be reported and tracked. The kernel development process works well in many ways, but that does not mean that it is without some glaring problems.

(Log in to post comments)

The case of the supersized shebang

Posted Feb 18, 2019 19:11 UTC (Mon) by TheJH (subscriber, #101155) [Link]

> By longstanding Unix convention, an attempt to execute a file that does not have a recognized binary format will result in that file being passed to an interpreter. By default, the interpreter is a shell, which will interpret the file as a shell script.

This part is implemented in userspace, not in the kernel; e.g. bash seems to implement this in shell_execve(), and glibc implements it for execlp() and execvp() in __execvpe_common(). Both of them do this when -ENOEXEC is returned by execve(). So if the kernel wants to hard-fail the execution of a file without letting userspace potentially fall back to script execution, it has to return a different error.

The case of the supersized shebang

Posted Feb 18, 2019 19:55 UTC (Mon) by jspenguin (subscriber, #120333) [Link]

It does implement it in the kernel. The glibc fallback is only for the case where there is no #! line, where bash will automatically assume it's a shell script, fork itself (without exec), and run the script under the child.

You can verify the actual results of the system calls with strace. In the case of a script with a shebang, the execve call itself succeeds, indicating it was handled by the kernel.

The case of the supersized shebang

Posted Feb 18, 2019 19:58 UTC (Mon) by TheJH (subscriber, #101155) [Link]

I guess I should have written this more clearly; I was referring to the last quoted sentence, "By default, the interpreter is a shell, which will interpret the file as a shell script.".

The case of the supersized shebang

Posted Feb 19, 2019 18:49 UTC (Tue) by dona73110 (subscriber, #113155) [Link]

Yeah, that line

>By default, the interpreter is a shell, which will interpret the file as a shell script

is not really correct; it's a shell convention but the kernel will definitely _not_ use a default interpreter if the file does not start with #!

The case of the supersized shebang

Posted Feb 20, 2019 8:13 UTC (Wed) by epa (subscriber, #39769) [Link]

Isn’t there an ancient convention, predating shebang lines, that if it doesn’t look like a binary executable it gets run through /bin/sh by default?

The case of the supersized shebang

Posted Feb 20, 2019 11:19 UTC (Wed) by epa (subscriber, #39769) [Link]

OK, according to the historical lore linked later in this discussion, it's the shell that is responsible for treating an executable as a shell script if it looks like one. Also Perl's exec() call does it. But the raw system call and C library do not: trying to execl() an executable file which just contains 'echo hello' will fail. It needs the shebang line.

The case of the supersized shebang

Posted Feb 20, 2019 11:55 UTC (Wed) by pebolle (subscriber, #35204) [Link]

Michael Kerrisk's TLPI (p. 575) points to execlp() and execvp() for that behaviour. A quick test showed that execlp() indeed treats such an executable file "as though [it] started with a line containing the string #!/bin/sh".

The case of the supersized shebang

Posted Feb 20, 2019 22:08 UTC (Wed) by foom (subscriber, #14868) [Link]

Some shells *also* handle this themselves.

E.g., with bash, if execve fails with -ENOEXEC, it will reset all the shell state and evaluate the #!-less script file directly in the post-fork bash process, rather than exec'ing anything at all (neither /bin/sh nor /bin/bash!)

The case of the supersized shebang

Posted Feb 20, 2019 22:32 UTC (Wed) by pebolle (subscriber, #35204) [Link]

I see. So it seems - or, put another way, strace showed me - that this behaviour of /bin/sh is what allows execlp() to do its, well, magic.

The case of the supersized shebang

Posted Feb 22, 2019 8:31 UTC (Fri) by epa (subscriber, #39769) [Link]

Sounds like a handy trick to speed up your shell scripts!

The case of the supersized shebang

Posted Feb 20, 2019 1:55 UTC (Wed) by scientes (subscriber, #83068) [Link]

I came here to post the same thing. The article really should be fixed. Linux has nothing shell specific, and one of systemd's goals was to remove that dependancy, so reading this kind of irked me (really?????).

The case of the supersized shebang

Posted Feb 18, 2019 19:20 UTC (Mon) by fuhchee (subscriber, #40059) [Link]

If you offload decisionmaking to a machine, and don't spend time supervising & educating the machine on an ongoing basis, don't be surprised at surprising results. Given the security vulnerability value of getting a malevolent patch into the -stable series, having the model available to play with allows adversaries to tune patches to pass that filter, and hide in the mob.

The case of the supersized shebang

Posted Feb 20, 2019 9:35 UTC (Wed) by meuh (subscriber, #22042) [Link]

Artificial intelligence gives artificial answers

Wrong interpreter

Posted Feb 18, 2019 19:53 UTC (Mon) by rfunk (subscriber, #4054) [Link]

So if I understand the situation correctly, this patch was intended to avoid *maybe* getting the wrong interpreter (a situation that Unix users have known about for some 40 or 50 years) by falling back to the default shell -- which is almost certainly the wrong interpreter!

If someone wanted to fix this longstanding misbehavior, the sensible approach would be to expand the interpreter-string buffer (maybe dynamically), not make over-long interpreter strings cause a fallback to a completely different interpreter.

Wrong interpreter

Posted Feb 18, 2019 21:16 UTC (Mon) by jhoblitt (subscriber, #77733) [Link]

The 128 char limit sounds like a lot but it has caused by a fair amount of grief over the years with build/CI systems with deep directly structures that host a copy of an interpreter in the bottom most leaves. It would be fantastic if this could be increased, perhaps a sysctl default of the old 128 char limit.

Wrong interpreter

Posted Feb 18, 2019 22:20 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

It's not just that they might get the wrong interpreter or (potentially more worrisome) get the right interpreter with a mangled set of parameters. It's that at least some of the interpreters are smart enough to re-read the first line just in case it was truncated. Honestly, re-reading the shebang line seems like the right thing for the interpreter to do. The line is right there in the script that the interpreter has to read anyway, and it's documented that at least one operating system will mangle the line if it's too long. Why wouldn't an interpreter reread it?

Wrong interpreter

Posted Feb 19, 2019 3:25 UTC (Tue) by gdt (subscriber, #6284) [Link]

From the interpreter author's point of view, they probably encountered this issue as non-program text appearing before the program text; their initial interest was unlikely to be around support of long argument lines but dealing with a compile bug.

The interpreter authors were doing the 'right' thing: using the system library as intended and documented in the manual. That's why it doesn't initially occur to interpreter authors to re-construct the command line from argv and from the program text and then use a non-system parser to re-parse the command line options.

Wrong interpreter

Posted Feb 19, 2019 14:01 UTC (Tue) by NAR (subscriber, #1313) [Link]

The example Erlang script (escript) starts like this:

#!/usr/bin/env escript
%% -*- erlang -*-
%%! -smp enable -sname factorial -mnesia debug verbose

The first line is the portable shebang. Second is simple comment (maybe interpreted by emacs?), the third line also looks like a comment, but parsed by the interpreter and contains arguments for the interpreter. I guess this is done this way due to the limitations of /usr/bin/env so it cannot pass arguments directly to the interpreter. This seems to be a sensible approach.

Wrong interpreter

Posted Feb 19, 2019 18:30 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

-*- language -*- close to the start of some text file causes Emacs to switch to the major mode for language.

Wrong interpreter

Posted Feb 19, 2019 15:08 UTC (Tue) by anton (subscriber, #25547) [Link]

Gforth treats #! as starting a comment line.

Why not reread the #! line and interpret it? Because treating it as a comment is simpler, by quite a lot. Consider a gforth script script that is invoked with

./script scriptarg1 scriptarg2

and starts with

#! /usr/bin/gforth gfortharg

Then the kernel constructs the following command line:

/usr/bin/gforth gfortharg ./script scriptarg1 scriptarg2

Note that gforth processes gfortharg before the rest of the command line. Then, when it sees ./script, and the #! at its start, it might start doing something about this line. But why, given that the kernel already does it nicely (well, in most cases at least)?

One advantage of doing your own processing is that one could have more than one argument on the #! line, as demonstrated in the Perl case, but for now the pain has not been big enough to go to these lengths.

Wrong interpreter

Posted Feb 19, 2019 19:00 UTC (Tue) by dona73110 (subscriber, #113155) [Link]

> So if I understand the situation correctly, this patch was intended to avoid *maybe* getting the wrong interpreter (a situation that Unix users have known about for some 40 or 50 years) by falling back to the default shell -- which is almost certainly the wrong interpreter!

That's what corbet said but it's not exactly what's happening - if you look at the patch there's no default to a shell, it just returns an error that the file was not executable.

The shell is the one that takes this "can't execute" error as a cue to interpret the file shell instead :-x

Wrong interpreter

Posted Feb 19, 2019 19:11 UTC (Tue) by rfunk (subscriber, #4054) [Link]

Wherever the shell fallback is implemented, the result is the same, so that seems irrelevant unless you're also going to patch all the shells that implement that fallback.

Wrong interpreter

Posted Feb 20, 2019 6:49 UTC (Wed) by marcH (subscriber, #57642) [Link]

Falling back on the default shell is not the same than falling back on the current shell.

The case of the supersized shebang

Posted Feb 18, 2019 20:57 UTC (Mon) by zblaxell (subscriber, #26385) [Link]

#! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl
-I/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl
-I/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl
-I/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl
-I/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl

It seems odd to do that on the command line, when Perl has things like BEGIN{} blocks and @INC to manipulate the interpreter during the parse/compile phase. Do other languages do what Perl does, or does NixOS do something else when the language isn't Perl?

The case of the supersized shebang

Posted Feb 18, 2019 21:41 UTC (Mon) by jccleaver (subscriber, #127418) [Link]

perl has a very, very long tradition of DWIM-itude and silently adjusts for all sorts of weird behavior.

I would not be surprised if this dated from the perl4 era or before, when one of the benefits of perl over shell was that it was a bit more predictable and forgiving than dealing with the *nix Wars' differing interpretations of sh/csh/ksh/tcsh/etc...

The case of the supersized shebang

Posted Feb 19, 2019 10:07 UTC (Tue) by grawity (subscriber, #80596) [Link]

It gets better. If you give perl a script whose shebang line does not actually specify perl – e.g. if you accidentally run perl ~/myscript.py and that file specifies /usr/bin/python as interpreter – perl will helpfully run the script via /usr/bin/python for you.

The case of the supersized shebang

Posted Feb 24, 2019 9:17 UTC (Sun) by flussence (subscriber, #85566) [Link]

For a long time perl *also* had a bug where if the line looked like /path/to/perl$anyversion, it'd try to run it directly. That meant perl5 wouldn't run perl6 scripts at all, and probably would also break in weird situations with two perl5 environments on the system. It was fixed very recently, IIRC.

The case of the supersized shebang

Posted Feb 19, 2019 16:15 UTC (Tue) by roblucid (subscriber, #48964) [Link]

As a sysadmin who used Perl 4 there were key advantages over shell, otherwise some utilities would have to be C. Perl was much faster than shell by reducing fork and calls to utilities, it gave access to the C library, so could read system DBM files for example directly.

Indeed it re-read the #! line, IIRC to allow perl flags to be set and magic with env as perl at that time wasn't established in /usr/bin/perl.
Perl had extreme portability, it would use some tricks to work on 'broken' systems too.

The later Perl 5 gave more flexibility, but using dynamic libraries made the installation more fragile.

The case of the supersized shebang

Posted Feb 19, 2019 7:01 UTC (Tue) by epa (subscriber, #39769) [Link]

Pasting together a command line with -I and other flags is something you can do programmatically and reliably, but editing the program text itself to add BEGIN blocks or whatever is much hairier.

The case of the supersized shebang

Posted Feb 19, 2019 17:02 UTC (Tue) by zblaxell (subscriber, #26385) [Link]

Not really? Assuming it's a one-time adjustment during install, you just prepend, and in many cases you don't even need BEGIN:

#! /nix/store/mbwav8kz8b3y471wjsybgzw84mrh4js9-perl-5.28.1/bin/perl
use lib qw(
/nix/store/x6yyav38jgr924nkna62q3pkp0dgmzlx-perl5.28.1-File-Slurp-9999.25/lib/perl5/site_perl
/nix/store/ha8v67sl8dac92r9z07vzr4gv1y9nwqz-perl5.28.1-Net-DBus-1.1.0/lib/perl5/site_perl
/nix/store/dcrkvnjmwh69ljsvpbdjjdnqgwx90a9d-perl5.28.1-XML-Parser-2.44/lib/perl5/site_perl
/nix/store/rmji88k2zz7h4zg97385bygcydrf2q8h-perl5.28.1-XML-Twig-3.52/lib/perl5/site_perl
);

possibly adjusted to deal with the subtle differences between BEGIN { unshift(@INC, 'foo' }, use lib qw(foo), and -Ifoo--I remember there are some, but I don't remember what they are.

IIRC Perl 4 doesn't have 'use' or 'BEGIN', so perl4 definitely needs the shebang hack...but perl4 also implements a lot of bizarre stuff like "execute perl embedded in non-MIME-encoded email message", and...not the shebang hack (or at least it's not documented). Perl4 and NixOS are separated by a decade, so I doubt perl4's limitations had much to do with NixOS's design; however, early-2000's bugs in the Perl5 implementation could have been a problem for a project starting in 2003 (I know it was a problem with several of mine!).

Anyway, the answer I was looking for is apparently "only Perl and Guile do that, and NixOS does it for historical reasons, and they're going to stop now."

The case of the supersized shebang

Posted Feb 19, 2019 10:15 UTC (Tue) by zoobab (subscriber, #9945) [Link]

At least Nix knows how to reproductibility.

The case of the supersized shebang

Posted Feb 18, 2019 21:16 UTC (Mon) by flussence (subscriber, #85566) [Link]

What a mess. Several failures had to align perfectly for this to happen:

1. NixOS doing weird things that only work because the Perl 5 runtime contains a hack to compensate for them (what do they do for other languages without a similar hack, and why is this one different?)
2. The kernel using a fixed-size buffer with a weird legacy size (zero relation to PATH_MAX or ARG_MAX)
3. There were no regression tests for this part of the kernel yet
4. The kernel Bugzilla is functionally a bitbucket in a separate universe from normal kernel development
5. As others have pointed out, the backporting process is only as smart as Youtube's recommendation algorithm

Probably a few more I've overlooked too. After all that, we're left with only a revert and a vague promise to fix #3. That seems like a weak response at best.

The case of the supersized shebang

Posted Feb 18, 2019 21:54 UTC (Mon) by grahamc (guest, #111068) [Link]

> 1. NixOS doing weird things that only work because the Perl 5 runtime contains a hack to compensate for them (what do they do for other languages without a similar hack, and why is this one different?)

Our Perl tooling is an older part of the NixOS ecosystem, and has received a bit less attention lately when compared to other languages.

Perl's hack is not to compensate for us exactly, but presumably someone long ago who was bitten by the same problem. This hack existing is the reason we've allowed such a weird shebang to exist: it allowed us to and didn't cause us problems. Frankly, nobody noticed the shebang until the kernel broke it.

We handle other languages much more sanely.

> 4. The kernel Bugzilla is functionally a bitbucket in a separate universe from normal kernel development

Yes, this was a bit of a frustrating and troubling learning curve for us.

The case of the supersized shebang

Posted Feb 18, 2019 21:56 UTC (Mon) by grahamc (guest, #111068) [Link]

Oops, I meant to expand my reply to include: We're now fixing how Perl dependencies are handled, to not depend on gigantic shebangs.

The case of the supersized shebang

Posted Feb 18, 2019 22:25 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

There isn't really anything wrong with "depending on gigantic shebangs" with Perl as the perl program is supposed to handle this case to enable consistent behaviour regardless of #!-implementation limits like processing only one argument directly attached to the #! or truncating the line prior to parsing it (the perlrun manpage mentions a historic 32 character limit).

A 127 character limit is documented behaviour for #! and Linux execve (the final byte being occupied by a 0).

The patch (and everyone involved with that) is entirely to blame here as it changes documented, observable behaviour based on speculations about possible errors based on the wrong assumption that nothing but the kernel ever interprets the content of a #!-line. But perl has been intepreting this line since at least 5.004, probably longer (that's the earliest example I know of).

The case of the supersized shebang

Posted Feb 18, 2019 22:08 UTC (Mon) by joncb (subscriber, #128491) [Link]

That was pretty much my takeaway too.

Fixing #4 seems like the way to go here. If the bug report managed to reach the right ear then the stable thing wouldn't have happened. Bonus points if this didn't require human input.

There's a possible angle around having an explicit negative tag (i.e. "do not backport this") so that the system can distinguish between "human has decided this patch should not be backported" and "human has not evaluated this patch for backporting".

The case of the supersized shebang

Posted Feb 18, 2019 22:32 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> 2. The kernel using a fixed-size buffer

That's not even slightly unusual. There are tons of places where syscalls and other kernel interfaces impose arbitrary limits on the sizes of things. Just about the only major exceptions are read()/write()-style interfaces such as file-likes and sockets.

> with a weird legacy size (zero relation to PATH_MAX or ARG_MAX)

Frankly, I agree with the kernel here. Shebang lines are a convenience, and more importantly they're intended for use by the (userspace components of the) distro, and are accidentally useful to end users, but random other developers have (IMHO) no business writing shebang lines. The distro knows where Python is installed, so that it doesn't need to write #!/usr/bin/env python to force a PATH search. The distro has the prerogative (and responsibility) to stick interpreters in sensible directories like /usr/bin instead of some crazy deep path under /opt, so that short truncation is not a problem either. In short, shebangs are remarkably well-suited to the specific problems that distros need to solve when implementing standard binaries using a custom interpreter.

The fact that shebangs also happen to be useful to end users sticking handy scripts in ~/bin is purely an accident. The idea that shebangs can or ought to be "portable" boggles the mind; just write a wrapper shell script. It's cleaner, more portable, and involves substantially less abuse of /usr/bin/env and other such "clever tricks."

The case of the supersized shebang

Posted Feb 18, 2019 22:48 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

> Frankly, I agree with the kernel here. Shebang lines are a convenience, and more importantly they're intended for use
> by the (userspace components of the) distro, and are accidentally useful to end users,

https://www.in-ulm.de/~mascheck/various/shebang/4.0BSD_ne...

The assumption that people who aren't "involved with a distro" must be "end users" (who have no business programming anything) is ... interesting ...

The case of the supersized shebang

Posted Feb 19, 2019 1:24 UTC (Tue) by KaiRo (subscriber, #1987) [Link]

IMHO, even assuming that "end users" have no business programming is, erm, interesting, as I thought one point of FLOSS was that *anyone* can start programming and contributing.

The case of the supersized shebang

Posted Feb 19, 2019 9:36 UTC (Tue) by brooksmoses (subscriber, #88422) [Link]

I think you're misunderstanding NYKevin's point -- he specifically said "writing shebang lines", not "programming".

If you're not writing code that's part a distro's userspace, you should (generally) be writing code to be portable across different distros. (Even if you're an end-user with only one computer, you may well change your distro in the future and want to run your old code.) However, a shebang line is not reliably portable -- to pick an example that I've recently been having pains with, "#!/usr/bin/python2" assumes that your distro has Python version 2 installed in /usr/bin/python2. If you instead have only /usr/bin/python2.7, or /usr/local/bin/python2, or something else, then it doesn't work. And shebangs don't follow the PATH, so you have to specify an absolute path.

In the particular case I was having pains with, the relevant code was in a testsuite, and it would have been much nicer to have Autoconf find the right interpreter and then call it explicitly via something like "$PYTHON testfile.py".

The case of the supersized shebang

Posted Feb 19, 2019 13:48 UTC (Tue) by NAR (subscriber, #1313) [Link]

However, a shebang line is not reliably portable

I guess that's why /usr/bin/env was invented for...

The case of the supersized shebang

Posted Feb 19, 2019 16:36 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

Sort-of. It does a PATH seach internally. This may find some perl executable and it may even be the one supposed to execute a particular script. However, it might as well not. perl is a fairly stable execution environment, however, it has its share of "Let's break working stuff because it's just WRONG!" (that it works, presumably :->) people who add essentially random code changes[*] whose purpose doesn't seem to be known to anyone.

[*] Eg, starting with perl 5.16, xs functions can't be loaded via Dynaloader anymore unless a new keyword is added to the existing code. The documentation makes it very clear that loading xs-functions via Dynaloader Is Just Wrong[tm], but there's no positive justification for the change.

The case of the supersized shebang

Posted Feb 19, 2019 16:46 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

No, that is not what /usr/bin/env was invented for. /usr/bin/env was invented for setting up different environment variables - that's why it's called "env."

The case of the supersized shebang

Posted Feb 19, 2019 15:11 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

He wrote about "distro developers" vs "end user sticking scripts in ~/bin". This is a false dichotomy which is so false that it's laughable and all features of the system are ultimatively supposed to be useful to users.

*If* you want to run other people's code, don't stick interpeters in /opt/var/local/ultima-thule/lib/bin/asterisk/moff just because this makes perfect sense to you.

The case of the supersized shebang

Posted Feb 19, 2019 17:01 UTC (Tue) by zblaxell (subscriber, #26385) [Link]

> However, a shebang line is not reliably portable

I, for one, would prefer to solve that problem.

i.e. have a namespace that I can reference from portable scripts, like "#!/prefix/org/python/python/2.7.5", where the python installation puts in as many symlinks as required to accurately state the level of backward compatibility achieved.

The case of the supersized shebang

Posted Feb 19, 2019 19:38 UTC (Tue) by jccleaver (subscriber, #127418) [Link]

> Frankly, I agree with the kernel here. Shebang lines are a convenience, and more importantly they're intended for use by the (userspace components of the) distro, and are accidentally useful to end users, but random other developers have (IMHO) no business writing shebang lines.

That's... kind of insane. Is this point of view common? That might explain all the scripts I see written by devs (often java devs, but not always) that are kept chmod 644 and have no shebang.

The first thing I do is make them executable and put the proper shebang in, unless there's some specific external $PATH call I need to wrap into it, which is just dumb... or weird.

This actually feels like a decent shibboleth among *nix admins and... non-*nix developers.

The case of the supersized shebang

Posted Feb 18, 2019 22:39 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

Several failures had to align perfectly for this to happen:

This is very often the case. A functioning quality control system has multiple safeguards to protect defective products from being released, so any time one is released it must be the result of multiple failures. That's why it's important to look at all the points of failure and try to fix them all rather than laying the blame on a single scapegoat.

The case of the supersized shebang

Posted Feb 18, 2019 21:59 UTC (Mon) by aszlig (subscriber, #91735) [Link]

Note that the actual problem here is not the ENOEXEC by itself, because the actual interpreter fits the 128 character limit. So returning ENOEXEC if the interpreter (but not the args) exceed the limit is fine (I think FreeBSD does it that way IIRC). The point where this was breaking things for us (and possibly other users) is that it was returning ENOEXEC if the *whole* shebang (including args) has exceeded the limit.

Given that, I still wonder why we were the only ones hitting that bug so far, because I don't think Perl (and others) would add such a workaround if there were users hitting that issue.

The case of the supersized shebang

Posted Feb 18, 2019 22:44 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

The perlrun manpage (mentioned above by rweikusat2) says "Because historically some operating systems silently chopped off kernel interpretation of the #! line after 32 characters, some switches may be passed in on the command line, and some may not; you could even get a "-" without its letter, if you're not careful." So the systems this was added for had one-fourth the limit Linux has, so it showed up much more often on them.

The case of the supersized shebang

Posted Feb 18, 2019 22:53 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

I suspect that Perl has to include their workaround because there's more than one way to invoke a perl script. They can be invoked either directly from the command line or by calling the interpreter with the script as an argument. Because the arguments on the shebang line still have to work even if the script is called as an argument to the interpreter, the interpreter needs to be able to read them. Automatically reading them no matter how the script is called seems like the logical approach to guarantee consistent behavior.

The case of the supersized shebang

Posted Feb 19, 2019 2:06 UTC (Tue) by pabs (subscriber, #43278) [Link]

Putting options into the shebang is generally not a good idea anyway, since someone could be running the script via the interpreter instead of running the script directly and the interpreter might not read the shebang to find its options.

How many interpreters have special handling of the shebang like Perl does?

The case of the supersized shebang

Posted Feb 19, 2019 4:34 UTC (Tue) by anguslees (subscriber, #7131) [Link]

It's much less common with the "modern" scripting languages (python/ruby), since they're usually treated as full-featured programming environments quite separate to any functionality they have as an interactive shell experience.

Guile is pretty close, with a "\" "meta switch" and defining #! ... !# as comment delimiters. You can do this:

#!/usr/local/bin/guile \
-e main -s
!#
(define (main args)
  (display "hello, world!")
  (newline))

Related, but different:

Many other "scripting" languages use # as a comment character and can do some similar shell bootstrap hack too. Most of these are used to lookup the language interpreter in the path, knowing only that /bin/sh exists.

For some reason I never understood, the modern world has decided it is better to assume that /usr/bin/env exists instead (using env in this way still doesn't allow multiple args, unless you're using freebsd's env -S magic arg - also in coreutils 8.30 apparently).

Perl:

#!/bin/sh
#! -*-perl-*-
eval 'exec perl -x -wS $0 ${1+"$@"}'
    if 0;
# perl code continues here

Python:

#!/bin/sh
''':'
if type python2 >/dev/null 2>/dev/null; then
  exec python2 "$0" "$@"
else
  exec python "$0" "$@"
fi
'''
# Python code continues here

(and similar hacks for awk, ruby, etc)

The case of the supersized shebang

Posted Feb 19, 2019 10:14 UTC (Tue) by dottedmag (subscriber, #18590) [Link]

/usr/bin/env is always there, unlike other interpreters which may end up in /usr/bin, /usr/local/bin or, say, /opt/bin, depending on what the user installed on their machine (e.g. macOS MacPorts vs. Homebrew).

The case of the supersized shebang

Posted Feb 19, 2019 11:08 UTC (Tue) by Baughn (subscriber, #124425) [Link]

Even NixOS has /use/bin/env, as the only file in /usr.

The case of the supersized shebang

Posted Feb 19, 2019 13:58 UTC (Tue) by DrHyde (guest, #130508) [Link]

/usr/bin/env is *not* always there. Some older or more obscure platforms have it elsewhere. IIRC I've run across this problem on SunOS and Unicos.

The case of the supersized shebang

Posted Feb 19, 2019 16:01 UTC (Tue) by dbaker (subscriber, #89236) [Link]

One of my projects has some interesting hacks because Haiku has /bin/env (no /usr at all, IIRC)

The case of the supersized shebang

Posted Feb 19, 2019 17:21 UTC (Tue) by DrHyde (guest, #130508) [Link]

I believe that http://catb.org/jargon/html/V/vaxocentrism.html applies, except with modern Linux instead of VAX.

The case of the supersized shebang

Posted Feb 19, 2019 13:23 UTC (Tue) by smurf (subscriber, #17840) [Link]

Those hacks are (a) wasteful (an extra interpreter needs to start up) and (b) fragile: Python, for instance, requires "from __future__ import …" statements to be the first statement in a file. Owch. "ln -s /bin/true /usr/bin/from", anybody?

The case of the supersized shebang

Posted Feb 19, 2019 16:16 UTC (Tue) by tux3 (subscriber, #101245) [Link]

Experimentaly "from __future import ..." seems to work fine with a #!, or even random comments before.
It seems python is happy as long as it's the first line actually executed, though the error message is confusing.

The case of the supersized shebang

Posted Feb 19, 2019 16:17 UTC (Tue) by tux3 (subscriber, #101245) [Link]

Oh I just realized what message you were replying to, my mistake.

The case of the supersized shebang

Posted Feb 19, 2019 12:54 UTC (Tue) by bgoglin (subscriber, #7800) [Link]

typo in 1st sentence of 3rd paragraph: "to be be 128"

The case of the supersized shebang

Posted Feb 19, 2019 13:55 UTC (Tue) by tome (subscriber, #3171) [Link]

A bit OT but...

I read the whole article and never noticed that typo until seeing your comment. So it seems my brain did a perly DWIM on that sentence, or rather, RWTAM -- read what the author means.

Typo

Posted Feb 19, 2019 14:06 UTC (Tue) by corbet (editor, #1) [Link]

It's always the last change that gets you ... fixed.

For future reference, we prefer to get typo reports at lwn@lwn.net; that avoids cluttering the comment section with them.

Typo

Posted Feb 19, 2019 18:49 UTC (Tue) by tome (subscriber, #3171) [Link]

Uh oh, I cluttered the clutter... and now this. I'd best be quiet.

New version of the patch

Posted Feb 19, 2019 15:03 UTC (Tue) by corbet (editor, #1) [Link]

I guess I was wrong about having to wait until 5.1 to try again; this patch was just merged for 5.0. It refuses to try to run a truncated interpreter name, but otherwise preserves the old behavior.

The case of the supersized shebang

Posted Feb 19, 2019 17:21 UTC (Tue) by Anssi (subscriber, #52242) [Link]

> If, however, the file starts with the characters "#!", the remainder of the first line will be treated as the name of the interpreter to use (and possibly arguments to be passed to that interpreter).

Well, only a single argument, not "arguments".

The case of the supersized shebang

Posted Feb 19, 2019 23:19 UTC (Tue) by ewen (subscriber, #4772) [Link]

Leaving aside the initial regression (the #! line has always been prone to truncation; as other comments say it used to be truncated earlier in older Unixes, and older software like perl has workarounds for this for decades), the discussion about the regression induced in *stable* kernels is more worrying. The discussion seems to be saying "whitelisting patches for the kernel stable tree is too hard and does not scale, so let's select them automatically and then let people blacklist the ones that shouldn't be backported". The "spotting ones to blacklist" seems even less likely to reliably scale, with more false negatives ("spot the problem patch amongst the hundreds automatically added this week" missing some that should have been flagged "don't backport").

If the "stable" kernels are going to have not-manually-selected/verified changes in them, and a shorter release cycle, it seems to me they'd increasingly become "alternative bleeding edge" kernels with older features, but "assorted newer patches added in for flavour." At which point I wonder who would run them? Those wanting actual stability probably end up relying on their distro kernel teams manual review, and those wanting the bleeding edge probably want the new features too. In other words the explicit manual selection of "stable" changes, and the QA, is what makes them "stable" kernels. Which seems increasingly not to be happening with the upstream kernel stable trees, because it's a lot of work.

As a sysadmin my desire for "stable" anything is (almost) no regressions, and some fixes for critical issues. Almost by definition with a preference for stability (ie, availability/reliability) over changes.

Ewen

The case of the supersized shebang

Posted Feb 19, 2019 23:47 UTC (Tue) by perennialmind (subscriber, #45817) [Link]

It looks like musl libc doesn't have the rickety fallback to crazytown. I for one hope they leave this POSIX compliance "bug" as is.

The case of the supersized shebang

Posted Feb 20, 2019 4:10 UTC (Wed) by jeremyhetzler (subscriber, #127663) [Link]

More on how shebang works, and gory details of how it differs in various systems:

https://www.in-ulm.de/~mascheck/various/shebang/

The case of the supersized shebang

Posted Feb 20, 2019 13:43 UTC (Wed) by smitty_one_each (subscriber, #28989) [Link]

It's a shame that something bad happened, but, considering the man-years of effort put into the kernel and the princely sums paid to so many of the contributors...it almost seems a left-handed compliment that such boo-boos are so rare.

Patch backports

Posted Feb 20, 2019 20:32 UTC (Wed) by Spack (subscriber, #77556) [Link]

How come a test kernel can feed a stable kernel with backported patches? I would have expected that only patches within the Linux stable kernel 5.0 would be allowed to be backported to previous versions?

Patch backports

Posted Feb 22, 2019 15:44 UTC (Fri) by nix (subscriber, #2304) [Link]

If that was the way it worked, then even serious bugs wouldn't get fixed more often than once every three months. That would render the stable kernels almost pointless, since you could always just upgrade to the also-stable Linus kernel the fixes came out of instead.

Patch backports

Posted Feb 23, 2019 21:32 UTC (Sat) by NAR (subscriber, #1313) [Link]

I guess people use -stable kernels because they explicitly do not want the latest version (due to fears about regressions, being locked to a version number, etc.). I don't know if they want a new -stable kernel every 3-4 days or just take a look at the top of their preferred -stable tree at their convenience (maybe once every quarter) and use that version.

Patch backports

Posted Feb 24, 2019 22:52 UTC (Sun) by nix (subscriber, #2304) [Link]

I use stable kernels because I want serious bugfixes, stability fixes, and security fixes but don't want to run -rc kernels on the systems my job depends on that house all my data thankyouverymuch. This... does not seem like a terribly unusual requirement, to me. I specifically do not avoid updating because I "don't want the latest version": if I could reliably update without rebooting I'd do it within minutes of every stable kernel coming out (but ksplice is fiddly for a self-compiled kernel and requires patch-by-patch analysis to determine which changes can be applied and kgraft is just as bad, AIUI: not a thing you can just throw a new stable kernel at and say "magically update me to this without rebooting").

Like most people operating small numbers of machines rather than huge failovered farms, I upgrade at irregular intervals, when a stable kernel with a bugfix seemingly serious enough to make it worth the annoyance-cum-terror of rebooting and flushing all my caches comes along -- though I suspect most people don't routinely read the git log and patch series of everything that hits -stable the way I do. (Rebooting is much less bad for performance than it used to be, thanks to bcache caching all the seeky metadata, but rebooting my core server is *always* terrifying: what if it never comes back up? It always has so far but this is a PC which means it's shit by definition, and I am not confident that all-Intel-mobo-plus-Intel-UEFI-only-one-corp-to-blame means it's reliable before the OS has started, not when I've *seen* the thing lock up once or twice when trying to enumerate its USB ports, exhaust some sort of watchdog timer, and autoreboot again before completing POST. I'm tempted to switch to kexec just to avoid most of that terror, but unfortunately kexec is even *less* routinely tested so the terror quotient would be greater. Yes of course I have backups, and backups of backups, and backups of backups of backups, but terror does not yield to common sense. I keep the ludicrous levels of backups anyway.)

Patch backports

Posted Mar 1, 2019 1:48 UTC (Fri) by flussence (subscriber, #85566) [Link]

I'm thinking the whole concept of having a -stable branch might be wrong if this is how it works in practice. The label “stable” is always going to be wishful thinking at best, since the halting problem applies.

One could argue it'd make sense to bless individual kernel versions as “stable” after a grace period passes with no complaints raised, but the existing process needs to be fixed so those complaints are heard before that can happen.

Patch backports

Posted Mar 1, 2019 16:30 UTC (Fri) by nix (subscriber, #2304) [Link]

That's more or less what I do locally, with most of my systems containing no real persistent state (either all contents are rsynced nightly from the big network-critical server, or they're just outright NFS-mounted from it) and running latest stable and sometimes random rcs with local hacks... and said server running a "stable stable" which survived at least a couple of weeks on the other boxes, with obviously-crucial security fixes cherry-picked back into that if need be. (I tried the distributed systems approach, having N of every distributable crucial service on different machines, and found that unless you spent *ages* analyzing you ended up introducing N single points of failure rather than just one, so I've gone back to the "one great big single point of failure which is at least easy to identify" approach. It is also too heavy to lift and won't fit through the door so is probably fairly theft-proof.)

This approach seems to work. I haven't lost any filesystems on the big server for, oh, almost a year now! :P

The case of the supersized shebang

Posted Feb 20, 2019 22:59 UTC (Wed) by neilbrown (subscriber, #359) [Link]

Time for a new patch tag I guess:

Fixes: NOTHING - don't ever back-port this to stable

The case of the supersized shebang

Posted Feb 21, 2019 15:02 UTC (Thu) by rweikusat2 (subscriber, #117920) [Link]

......... :-))

binfmt_misc

Posted Feb 21, 2019 8:24 UTC (Thu) by mm7323 (subscriber, #87386) [Link]

Couldn't you just use binfmt_misc to match on #! and then run a process that can open up the script and figure out the required interpreter and arguments in userspace?

That should surely be possible for a distro that wants to do unusual things and only costs an extra exec() for starting up scripts which aren't known to be that fast anyway.