There are abundant resources online trying to scare programmers away from using shell scripts. Most of them, if anything, succeed in convincing the reader to blindly put something that resembles
set -euo pipefail
at the top of their scripts. Let's focus on the "-e" flag. What does this do? Well, here are descriptions of this flag from the first two results on Google for "writing safe bash scripts":
- "If a command fails,
set -e
will make the whole script exit, instead of just resuming on the next line" (https://sipb.mit.edu/doc/safe-shell/) - "This tells bash that it should exit the script if any statement returns a non-true return value." (http://www.davidpashley.com/articles/writing-robust-shell-scripts/)
Unfortunately, this is bash we are talking about and the story is never that simple.
A couple months ago, a particular production bash script (if that doesn't sound horrifying, hopefully it will by the end of this post) failed in the worst kind of way: silently. The script generates a list of valid users at Jane Street and pushes this out to our inbound mail servers. It looks something like:
set -euo pipefail
...
echo "($(ldap-query-for-valid-users))" > "/tmp/all-users.sexp"
...
push-all-users-if-different
On this one particular day, a file was deployed with the contents "()". But why didn't set -e
cause the script to exit when ldap-query-for-valid-users
failed? A quick look at the bash man page answers this question. It turns out that there are a couple surprising subtleties to this flag. Here are a couple:
set -e works on "simple commands"
A script will exit early if the exit status of a simple command is nonzero. So how is a simple command executed? In short, bash does all expansions and checks to see if there is still a command to run. If there is a command to run, the exit status of the simple command is the exit status of the command. If there is not a command to run, the exit status of the simple command is the exit status of the last command substitution performed. Here are some example commands that all have exit status 0, so would not cause a set -e
script to exit:
# echo, local and export are commands that always have exit status 0
echo "$(/bin/false)"
local foo="$(/bin/false)"
export foo="$(/bin/false)"
# the last command substitution has exit status 0
foo="$(/bin/false)$(/bin/true)"
set -e does not get passed to subshells in command substitution (without --posix)
Here is an example consequence of this:
set -e
foo() {
/bin/false
echo "foo"
}
echo "$(foo)"
Running this script with bash
will print "foo" while running this with bash --posix
(or sh
) will not. Both scripts will exit with status 0.
Tangible takeaway
This is not to say that something like set -euo pipefail
should not be used at the top of all bash scripts, but it should not give you a false sense of security. Like all production code, you must reason about all failure conditions and ensure they are handled appropriately. Even if you are some kind of bash expert who knows all these subtleties, chances are your peers do not. The execution of shell scripts is subtle and confusing, and for production code, there is likely a better tool for the job.
A general rule of thumb I find quite handy is to use only functions “that cannot fail” in
$(…)
expansions, because it can be quite difficult to handle the failure. So, generating JSON or S-EXPR payload from pre-computed data (either stored in variables or in file) is a good fit, but performing a request on a complex system is not.Actually, the short snippet you posted suggests it has been written by someone who tries to use shell programming as a regular imperative programming language, therefore mimicking the constructs of such languages. The day I started to consider shell programming as a field on its own, I made significant progresses: mimicking constructs from imperative languages is almost always a failure. Not a blatant failure, but a nasty, hidden, failure, that silently bites. Here, the failure is to use the shell to do data processing, even in its easiest form like adding parenthesis around a string.
(I leave aside the fact that
echo
should never be used to print random text.) It is much more robust to use a function that reads a list of users and prints the corresponding S-EXPR.For instance:
list2sexpr()
{
awk '
BEGIN { printf("(") }
NR > 1 { printf(" ") }
{printf("%s", $0)}
END { printf(")") }
'
}
ldap-query-for-valid-users | list2sexpr > "/tmp/all-users.sexp"
SED-ists can use the following alternative function
list2sexpr()
{
sed -n -e 'H;${x;s/\n/(/;s/$/)/;s/\n/ /g;p;q;}'
}
Or PERL
list2sexpr()
{
perl -e '@a=<>; chomp @a; print "(", (join " ", @a), ")"'
}
Of course any of the definitions of
list2expr
is more complicated, but it defines a small function reading some input and writing some output. However the resulting codeldap-query-for-valid-users | list2sexpr > "/tmp/all-users.sexp"
is easier to read and catches failures.
I wrote 2 years ago a more substantial example (yet very short)
As an aside I would like to mention a not-so-well-known facts. Functions run in a subshell if they are defined with
(…)
instead of{…}
which makes an easy and effective way to limit the scope of any shell falgs we want to use.(For some reason the formatting of my post is terrible and does not match at all the preview. I hope it can still be read!)
So, probably a first time caller, but not a long-time listener of the show?
we need more (ba)sh idioms
In addition to
echo
all command substitiutions like this seem to pass — now IIRC seen this before, but it always gets forgotten — thanks for reminding 😀bash -eufc ': `/bin/false`'; echo $?
0
in this case:
query_output=$(ldap-query-for-valid-users)
echo "($query_output)" > /tmp/all-users.sexp
if local, export, etc, desired, put those in separate expression. e.g.
local query_output; query_output=...
VAR=value; export VAR
on a related note try,
bash -eufc 'foo=bar; export $foo=bar; echo $bar'
anyway, for “solution” one needs:
better test coverage (use
bash -euc 'set -o pipefail; ...'; echo $?
)more checklists, idioms, etc. to look through when writing scripts.
hmm. markdown got lost in output :O
ok, it did not, it just looks different (highlighting), sorry about this (continuing) noise 😉
For subshells, there is ‘set -E’
Oh, I thought you were going to explain the impact that this had on your particular script, which is not entirely obvious from the snippets you gave. In the future this would be nice to bring things full circle and not leave the reader hanging. Was the result that all of your users got emailed unnecessarily? If so I think the issue is not the script itself but the underlying logic driving the script, which needs to be inverted.
Avoiding bash scripts all together as much as possible is one of those gifts that keeps on giving. Being very accessible, yet quirky and with limited potential makes it very dangerous. Just go search github for common bash errors and you’ll find plenty.
As with most programming, you really should be checking the error conditions at all times. That said – I tend to treat Bash like any other programming language – using it functions and variables to make the logic obvious, and the control functionality to check the values, almost always catching the exit code of a function or program call into a variable that is then checked.
Like with anything you can have very poorly written scripts (or programs). Those typically do not check errors, rely on exception handling at higher levels, and break easily in very non-obvious ways.
Thus:
function foo() {
echo do something
return 0
}
foo()
let -i result=$?
if [ ${result} -eq 0 ]; then
echo “handle the success case”
else
echo “handle the negative case(s)”
fi
is much preferable to:
if [ $(doSomeMassivePipedThingHere) ]; then
echo “handle the success case”
fi
BashFAQ #105 — at http://mywiki.wooledge.org/BashFAQ/105 — is essential reading for anyone considering the use of
set -e
.I saw this program written entirely in bash. http://www.github.com/nshaibu/QBox. It is a VMM.