(cache) When Bash Scripts Bite :: Jane Street Tech Blogs

When Bash Scripts Bite

Thu, 2017/05/11 - 3:02pm - Todd Lubin | Systems

There are abundant resources online trying to scare programmers away from using shell scripts. Most of them, if anything, succeed in convincing the reader to blindly put something that resembles

set -euo pipefail

at the top of their scripts. Let's focus on the "-e" flag. What does this do? Well, here are descriptions of this flag from the first two results on Google for "writing safe bash scripts":

"If a command fails, set -e will make the whole script exit, instead of just resuming on the next line" (https://sipb.mit.edu/doc/safe-shell/)
"This tells bash that it should exit the script if any statement returns a non-true return value." (http://www.davidpashley.com/articles/writing-robust-shell-scripts/)

Unfortunately, this is bash we are talking about and the story is never that simple.

A couple months ago, a particular production bash script (if that doesn't sound horrifying, hopefully it will by the end of this post) failed in the worst kind of way: silently. The script generates a list of valid users at Jane Street and pushes this out to our inbound mail servers. It looks something like:

set -euo pipefail
...
echo "($(ldap-query-for-valid-users))" > "/tmp/all-users.sexp"
...
push-all-users-if-different

On this one particular day, a file was deployed with the contents "()". But why didn't set -e cause the script to exit when ldap-query-for-valid-users failed? A quick look at the bash man page answers this question. It turns out that there are a couple surprising subtleties to this flag. Here are a couple:

set -e works on "simple commands"

A script will exit early if the exit status of a simple command is nonzero. So how is a simple command executed? In short, bash does all expansions and checks to see if there is still a command to run. If there is a command to run, the exit status of the simple command is the exit status of the command. If there is not a command to run, the exit status of the simple command is the exit status of the last command substitution performed. Here are some example commands that all have exit status 0, so would not cause a set -e script to exit:

# echo, local and export are commands that always have exit status 0
echo "$(/bin/false)"
local foo="$(/bin/false)"
export foo="$(/bin/false)"

# the last command substitution has exit status 0
foo="$(/bin/false)$(/bin/true)"

set -e does not get passed to subshells in command substitution (without --posix)

Here is an example consequence of this:

set -e

foo() {
    /bin/false
    echo "foo"
}
echo "$(foo)"

Running this script with bash will print "foo" while running this with bash --posix (or sh) will not. Both scripts will exit with status 0.

Tangible takeaway

This is not to say that something like set -euo pipefail should not be used at the top of all bash scripts, but it should not give you a false sense of security. Like all production code, you must reason about all failure conditions and ensure they are handled appropriately. Even if you are some kind of bash expert who knows all these subtleties, chances are your peers do not. The execution of shell scripts is subtle and confusing, and for production code, there is likely a better tool for the job.

(15) Comments

Michael Grünewald says:

May 11, 2017 at 4:33 pm

A general rule of thumb I find quite handy is to use only functions “that cannot fail” in $(…) expansions, because it can be quite difficult to handle the failure. So, generating JSON or S-EXPR payload from pre-computed data (either stored in variables or in file) is a good fit, but performing a request on a complex system is not.

Actually, the short snippet you posted suggests it has been written by someone who tries to use shell programming as a regular imperative programming language, therefore mimicking the constructs of such languages. The day I started to consider shell programming as a field on its own, I made significant progresses: mimicking constructs from imperative languages is almost always a failure. Not a blatant failure, but a nasty, hidden, failure, that silently bites. Here, the failure is to use the shell to do data processing, even in its easiest form like adding parenthesis around a string.

(I leave aside the fact that echo should never be used to print random text.) It is much more robust to use a function that reads a list of users and prints the corresponding S-EXPR.

For instance:

list2sexpr() { awk ' BEGIN { printf("(") } NR > 1 { printf(" ") } {printf("%s", $0)} END { printf(")") } ' }
ldap-query-for-valid-users | list2sexpr > "/tmp/all-users.sexp"

SED-ists can use the following alternative function

list2sexpr() { sed -n -e 'H;${x;s/\n/(/;s/$/)/;s/\n/ /g;p;q;}' }

Or PERL

list2sexpr() { perl -e '@a=<>; chomp @a; print "(", (join " ", @a), ")"' }

Of course any of the definitions of list2expr is more complicated, but it defines a small function reading some input and writing some output. However the resulting code

ldap-query-for-valid-users | list2sexpr > "/tmp/all-users.sexp"

is easier to read and catches failures.

I wrote 2 years ago a more substantial example (yet very short)

As an aside I would like to mention a not-so-well-known facts. Functions run in a subshell if they are defined with (…) instead of {…} which makes an easy and effective way to limit the scope of any shell falgs we want to use.

Reply
- Michael Grünewald says:
  
  May 11, 2017 at 4:40 pm
  
  (For some reason the formatting of my post is terrible and does not match at all the preview. I hope it can still be read!)
  
  Reply
Aaron says:

May 11, 2017 at 5:39 pm

the short snippet you posted suggests it has been written by someone who tries to use shell programming as a regular imperative programming language

So, probably a first time caller, but not a long-time listener of the show?

Reply
Tomi says:

May 12, 2017 at 1:30 am

we need more (ba)sh idioms

In addition to echo all command substitiutions like this seem to pass — now IIRC seen this before, but it always gets forgotten — thanks for reminding 😀

bash -eufc ': `/bin/false`'; echo $? 0

in this case:

query_output=$(ldap-query-for-valid-users) echo "($query_output)" > /tmp/all-users.sexp

if local, export, etc, desired, put those in separate expression. e.g.

local query_output; query_output=...
VAR=value; export VAR

on a related note try, bash -eufc 'foo=bar; export $foo=bar; echo $bar'

anyway, for “solution” one needs:

better test coverage (use bash -euc 'set -o pipefail; ...'; echo $?)
more checklists, idioms, etc. to look through when writing scripts.

Reply
- Tomi says:
  
  May 12, 2017 at 1:31 am
  
  hmm. markdown got lost in output :O
  
  Reply
  - Tomi says:
    
    May 12, 2017 at 1:32 am
    
    ok, it did not, it just looks different (highlighting), sorry about this (continuing) noise 😉
    
    Reply
Bjoern says:

May 12, 2017 at 6:32 am

For subshells, there is ‘set -E’

Reply
adam says:

May 12, 2017 at 7:06 am

Oh, I thought you were going to explain the impact that this had on your particular script, which is not entirely obvious from the snippets you gave. In the future this would be nice to bring things full circle and not leave the reader hanging. Was the result that all of your users got emailed unnecessarily? If so I think the issue is not the script itself but the underlying logic driving the script, which needs to be inverted.

Reply
Eman says:

May 12, 2017 at 7:54 am

Avoiding bash scripts all together as much as possible is one of those gifts that keeps on giving. Being very accessible, yet quirky and with limited potential makes it very dangerous. Just go search github for common bash errors and you’ll find plenty.

Reply
Benjamen R. Meyer says:

May 12, 2017 at 9:52 am

As with most programming, you really should be checking the error conditions at all times. That said – I tend to treat Bash like any other programming language – using it functions and variables to make the logic obvious, and the control functionality to check the values, almost always catching the exit code of a function or program call into a variable that is then checked.

Like with anything you can have very poorly written scripts (or programs). Those typically do not check errors, rely on exception handling at higher levels, and break easily in very non-obvious ways.

Thus:

function foo() {
    echo do something
    return 0
}

foo()
let -i result=$?
if [ ${result} -eq 0 ]; then
    echo “handle the success case”
else
    echo “handle the negative case(s)”
fi

is much preferable to:

if [ $(doSomeMassivePipedThingHere) ]; then
    echo “handle the success case”
fi

Reply
Charles Duffy says:

May 12, 2017 at 11:15 am

BashFAQ #105 — at http://mywiki.wooledge.org/BashFAQ/105 — is essential reading for anyone considering the use of set -e.

Reply
kwasi says:

June 20, 2017 at 3:11 am

I saw this program written entirely in bash. http://www.github.com/nshaibu/QBox. It is a VMM.

Reply

When Bash Scripts Bite

set -e works on "simple commands"

set -e does not get passed to subshells in command substitution (without --posix)

Tangible takeaway

(15) Comments

Leave a Reply Cancel reply