The Core Language
C provides no memory safety. Most recommendations in this section deal with this aspect of the language.
Undefined Behavior
Some C constructs are defined to be undefined by the C standard. This does not only mean that the standard does not describe what happens when the construct is executed. It also allows optimizing compilers such as GCC to assume that this particular construct is never reached. In some cases, this has caused GCC to optimize security checks away. (This is not a flaw in GCC or the C language. But C certainly has some areas which are more difficult to use than others.)
Common sources of undefined behavior are:
-
out-of-bounds array accesses
-
null pointer dereferences
-
overflow in signed integer arithmetic
Recommendations for Pointers and Array Handling
Always keep track of the size of the array you are working with. Often, code is more obviously correct when you keep a pointer past the last element of the array, and calculate the number of remaining elements by subtracting the current position from that pointer. The alternative, updating a separate variable every time when the position is advanced, is usually less obviously correct.
Array processing in C
shows how to extract Pascal-style strings from a character
buffer. The two pointers kept for length checks are
inend
and outend
.
inp
and outp
are the
respective positions.
The number of input bytes is checked using the expression
len > (size_t)(inend - inp)
.
The cast silences a compiler warning;
inend
is always larger than
inp
.
ssize_t
extract_strings(const char *in, size_t inlen, char **out, size_t outlen)
{
const char *inp = in;
const char *inend = in + inlen;
char **outp = out;
char **outend = out + outlen;
while (inp != inend) {
size_t len;
char *s;
if (outp == outend) {
errno = ENOSPC;
goto err;
}
len = (unsigned char)*inp;
++inp;
if (len > (size_t)(inend - inp)) {
errno = EINVAL;
goto err;
}
s = malloc(len + 1);
if (s == NULL) {
goto err;
}
memcpy(s, inp, len);
inp += len;
s[len] = '\0';
*outp = s;
++outp;
}
return outp - out;
err:
{
int errno_old = errno;
while (out != outp) {
free(*out);
++out;
}
errno = errno_old;
}
return -1;
}
It is important that the length checks always have the form
len > (size_t)(inend - inp)
, where
len
is a variable of type
size_t
which denotes the total
number of bytes which are about to be read or written next. In
general, it is not safe to fold multiple such checks into one,
as in len1 + len2 > (size_t)(inend - inp)
,
because the expression on the left can overflow or wrap around
(see Recommendations for Integer Arithmetic), and it
no longer reflects the number of bytes to be processed.
Recommendations for Integer Arithmetic
Overflow in signed integer arithmetic is undefined. This means that it is not possible to check for overflow after it happened, see Incorrect overflow detection in C.
void report_overflow(void);
int
add(int a, int b)
{
int result = a + b;
if (a < 0 || b < 0) {
return -1;
}
// The compiler can optimize away the following if statement.
if (result < 0) {
report_overflow();
}
return result;
}
The following approaches can be used to check for overflow, without actually causing it.
-
Use a wider type to perform the calculation, check that the result is within bounds, and convert the result to the original type. All intermediate results must be checked in this way.
-
Perform the calculation in the corresponding unsigned type and use bit fiddling to detect the overflow. Overflow checking for unsigned addition shows how to perform an overflow check for unsigned integer addition. For three or more terms, all the intermediate additions have to be checked in this way.
void report_overflow(void);
unsigned
add_unsigned(unsigned a, unsigned b)
{
unsigned sum = a + b;
if (sum < a) { // or sum < b
report_overflow();
}
return sum;
}
-
Compute bounds for acceptable input values which are known to avoid overflow, and reject other values. This is the preferred way for overflow checking on multiplications, see Overflow checking for unsigned multiplication.
unsigned
mul(unsigned a, unsigned b)
{
if (b && a > ((unsigned)-1) / b) {
report_overflow();
}
return a * b;
}
Basic arithmetic operations are commutative, so for bounds checks,
there are two different but mathematically equivalent
expressions. Sometimes, one of the expressions results in
better code because parts of it can be reduced to a constant.
This applies to overflow checks for multiplication a *
b
involving a constant a
, where the
expression is reduced to b > C
for some
constant C
determined at compile time. The
other expression, b && a > ((unsigned)-1) /
b
, is more difficult to optimize at compile time.
When a value is converted to a signed integer, GCC always chooses the result based on 2’s complement arithmetic. This GCC extension (which is also implemented by other compilers) helps a lot when implementing overflow checks.
Sometimes, it is necessary to compare unsigned and signed integer variables. This results in a compiler warning, comparison between signed and unsigned integer expressions, because the comparison often gives unexpected results for negative values. When adding a cast, make sure that negative values are covered properly. If the bound is unsigned and the checked quantity is signed, you should cast the checked quantity to an unsigned type as least as wide as either operand type. As a result, negative values will fail the bounds check. (You can still check for negative values separately for clarity, and the compiler will optimize away this redundant check.)
Legacy code should be compiled with the -fwrapv
GCC option. As a result, GCC will provide 2’s complement
semantics for integer arithmetic, including defined behavior on
integer overflow.
Global Variables
Global variables should be avoided because they usually lead to
thread safety hazards. In any case, they should be declared
static
, so that access is restricted to a
single translation unit.
Global constants are not a problem, but declaring them can be
tricky. Declaring a constant array of constant strings
shows how to declare a constant array of constant strings.
The second const
is needed to make the
array constant, and not just the strings. It must be placed
after the *
, and not before it.
static const char *const string_list[] = {
"first",
"second",
"third",
NULL
};
Sometimes, static variables local to functions are used as a
replacement for proper memory management. Unlike non-static
local variables, it is possible to return a pointer to static
local variables to the caller. But such variables are
well-hidden, but effectively global (just as static variables at
file scope). It is difficult to add thread safety afterwards if
such interfaces are used. Merely dropping the
static
keyword in such cases leads to
undefined behavior.
Another source for static local variables is a desire to reduce
stack space usage on embedded platforms, where the stack may
span only a few hundred bytes. If this is the only reason why
the static
keyword is used, it can just be
dropped, unless the object is very large (larger than
128 kilobytes on 32-bit platforms). In the latter case, it is
recommended to allocate the object using
malloc
, to obtain proper array checking, for
the same reasons outlined in [sect-Defensive_Coding-C-Allocators-alloca].