Shell Programming and bash
This chapter contains advice about shell programming, specifically in bash. Most of the advice will apply to scripts written for other shells because extensions such as integer or array variables have been implemented there as well, with comparable syntax.
Consider Alternatives
Once a shell script is so complex that advice in this chapter applies, it is time to step back and consider the question: Is there a more suitable implementation language available?
For example, Python with its subprocess
module
can be used to write scripts which are almost as concise as shell
scripts when it comes to invoking external programs, and Python
offers richer data structures, with less arcane syntax and more
consistent behavior.
Shell Language Features
The following sections cover subtleties concerning the shell programming languages. They have been written with the bash shell in mind, but some of these features apply to other shells as well.
Some of the features described may seem like implementation defects, but these features have been replicated across multiple independent implementations, so they now have to be considered part of the shell programming language.
Parameter Expansion
The mechanism by which named shell variables and parameters are
expanded is called parameter expansion. The
most basic syntax is
“$
variable” or
“${
variable}
”.
In almost all cases, a parameter expansion should be enclosed in
double quotation marks “…”
.
external-program "$arg1" "$arg2"
If the double quotation marks are omitted, the value of the
variable will be split according to the current value of the
IFS
variable. This may allow the injection of
additional options which are then processed by
external-program
.
Parameter expansion can use special syntax for specific features, such as substituting defaults or performing string or array operations. These constructs should not be used because they can trigger arithmetic evaluation, which can result in code execution. See Arithmetic Evaluation.
Double Expansion
Double expansion occurs when, during the expansion of a shell variable, not just the variable is expanded, replacing it by its value, but the value of the variable is itself is expanded as well. This can trigger arbitrary code execution, unless the value of the variable is verified against a restrictive pattern.
The evaluation process is in fact recursive, so a self-referential expression can cause an out-of-memory condition and a shell crash.
Double expansion may seem like as a defect, but it is implemented by many shells, and has to be considered an integral part of the shell programming language. However, it does make writing robust shell scripts difficult.
Double expansion can be requested explicitly with the
eval
built-in command, or by invoking a
subshell with “bash -c
”. These constructs
should not be used.
The following sections give examples of places where implicit double expansion occurs.
Arithmetic Evaluation
Arithmetic evaluation is a process by which the shell computes the integer value of an expression specified as a string. It is highly problematic for two reasons: It triggers double expansion (see Double Expansion), and the language of arithmetic expressions is not self-contained. Some constructs in arithmetic expressions (notably array subscripts) provide a trapdoor from the restricted language of arithmetic expressions to the full shell language, thus paving the way towards arbitrary code execution. Due to double expansion, input which is (indirectly) referenced from an arithmetic expression can trigger execution of arbitrary code, which is potentially harmful.
Arithmetic evaluation is triggered by the follow constructs:
-
The expression in “
$
expression” is evaluated. This construct is called arithmetic expansion.
-
“
$[
expression]
” is a deprecated syntax with the same effect. -
The arguments to the
let
shell built-in are evaluated. -
“
expression
” is an alternative syntax for “
let
expression”. -
Conditional expressions surrounded by “
[[
…]]
” can trigger arithmetic evaluation if certain operators such as-eq
are used. (Thetest
built-in does not perform arithmetic evaluation, even with integer operators such as-eq
.)The conditional expression “
[[ $
variable=~
regexp]]
” can be used for input validation, assuming that regexp is a constant regular expression. See Performing Input Validation. -
Certain parameter expansions, for example “
${
variable[
expression]}
” (array indexing) or “${
variable:
expression}
” (string slicing), trigger arithmetic evaluation of expression. -
Assignment to array elements using “array_variable
[
subscript]=
expression” triggers evaluation of subscript, but not expression. -
The expressions in the arithmetic
for
command, “for
expression1;
expression2;
expression3; do
commands; done
” are evaluated. This does not apply to the regular for command, “for
variablein
list; do
commands; done
”.
Depending on the bash version, the above list may be incomplete. If faced with a situation where using such shell features appears necessary, see Consider Alternatives. |
If it is impossible to avoid shell arithmetic on untrusted inputs, refer to Performing Input Validation.
Type declarations
bash supports explicit type declarations for shell variables:
declare -i integer_variable
declare -a array_variable
declare -A assoc_array_variable
typeset -i integer_variable
typeset -a array_variable
typeset -A assoc_array_variable
local -i integer_variable
local -a array_variable
local -A assoc_array_variable
readonly -i integer_variable
readonly -a array_variable
readonly -A assoc_array_variable
Variables can also be declared as arrays by assigning them an array expression, as in:
array_variable=(1 2 3 4)
Some built-ins (such as mapfile
) can
implicitly create array variables.
Such type declarations should not be used because assignment to such variables (independent of the concrete syntax used for the assignment) triggers arithmetic expansion (and thus double expansion) of the right-hand side of the assignment operation. See Arithmetic Evaluation.
Shell scripts which use integer or array variables should be rewritten in another, more suitable language. Se Consider Alternatives.
Other Obscurities
Obscure shell language features should not be used. Examples are:
-
Exported functions (
export -f
ordeclare -f
). -
Function names which are not valid variable names, such as “
module::function
”. -
The possibility to override built-ins or external commands with shell functions.
-
Changing the value of the
IFS
variable to tokenize strings.
Invoking External Commands
When passing shell variables as single command line arguments, they should always be surrounded by double quotes. See Parameter Expansion.
Care is required when passing untrusted values as positional
parameters to external commands. If the value starts with a hyphen
“-
”, it may be interpreted by the external
command as an option. Depending on the external program, a
“--
” argument stops option processing and treats
all following arguments as positional parameters. (Double quotes
are completely invisible to the command being invoked, so they do
not prevent variable values from being interpreted as options.)
Cleaning the environment before invoking child processes is
difficult to implement in script. bash
keeps a hidden list of environment variables which do not correspond
to shell variables, and unsetting them from within a
bash script is not possible. To reset
the environment, a script can re-run itself under the “env
-i
” command with an additional parameter which indicates
the environment has been cleared and suppresses a further
self-execution. Alternatively, individual commands can be executed
with “env -i
”.
Complete isolation from its original execution environment (which is required when the script is executed after a trust transition, e.g., triggered by the SUID mechanism) is impossible to achieve from within the shell script itself. Instead, the invoking process has to clear the process environment (except for few trusted variables) before running the shell script. |
Checking for failures in executed external commands is recommended.
If no elaborate error recovery is needed, invoking “set
-e
” may be sufficient. This causes the script to stop on
the first failed command. However, failures in pipes
(“command1 | command2
”) are only detected for the
last command in the pipe, errors in previous commands are ignored.
This can be changed by invoking “set -o pipefail
”.
Due to architectural limitations, only the process that spawned
the entire pipe can check for failures in individual commands;
it is not possible for a process to tell if the process feeding
data (or the process consuming data) exited normally or with
an error.
See [sect-Defensive_Coding-Tasks-Processes-Creation] for additional details on creating child processes.
Temporary Files
Temporary files should be created with the
mktemp
command, and temporary directories with
“mktemp -d
”.
To clean up temporary files and directories, write a clean-up shell function and register it as a trap handler, as shown in Creating and Cleaning up Temporary Files. Using a separate function avoids issues with proper quoting of variables.
tmpfile="$(mktemp)"
cleanup () {
rm -f -- "$tmpfile"
}
trap cleanup 0
Performing Input Validation
In some cases, input validation cannot be avoided. For example, if arithmetic evaluation is absolutely required, it is imperative to check that input values are, in fact, integers. See Arithmetic Evaluation.
Input validation in bash
shows a construct which can be used to check if a string
“$value
” is an integer. This construct is
specific to bash and not portable to
POSIX shells.
if [[ $value =~ ^-?[0-9]+$ ]] ; then
echo value is an integer
else
echo "value is not an integer" 1>&2
exit 1
fi
Using case
statements for input validation is
also possible and supported by other (POSIX) shells, but the
pattern language is more restrictive, and it can be difficult to
write suitable patterns.
The expr
external command can give misleading
results (e.g., if the value being checked contains operators
itself) and should not be used.
Guarding Shell Scripts Against Changes
bash only reads a shell script up to the point it is needed for executed the next command. This means that if script is overwritten while it is running, execution can jump to a random part of the script, depending on what is modified in the script and how the file offsets change as a result. (This behavior is needed to support self-extracting shell archives whose script part is followed by a stream of bytes which does not follow the shell language syntax.)
Therefore, long-running scripts should be guarded against
concurrent modification by putting as much of the program logic
into a main
function, and invoking the
main
function at the end of the script, using
this syntax:
main "$@" ; exit $?
This construct ensures that bash will
stop execution after the main
function, instead
of opening the script file and trying to read more commands.