Untitled :: Defensive Coding Guide

The Core Language

C provides no memory safety. Most recommendations in this section deal with this aspect of the language.

Undefined Behavior

Some C constructs are defined to be undefined by the C standard. This does not only mean that the standard does not describe what happens when the construct is executed. It also allows optimizing compilers such as GCC to assume that this particular construct is never reached. In some cases, this has caused GCC to optimize security checks away. (This is not a flaw in GCC or the C language. But C certainly has some areas which are more difficult to use than others.)

Common sources of undefined behavior are:

out-of-bounds array accesses
null pointer dereferences
overflow in signed integer arithmetic

Recommendations for Pointers and Array Handling

Always keep track of the size of the array you are working with. Often, code is more obviously correct when you keep a pointer past the last element of the array, and calculate the number of remaining elements by subtracting the current position from that pointer. The alternative, updating a separate variable every time when the position is advanced, is usually less obviously correct.

Array processing in C shows how to extract Pascal-style strings from a character buffer. The two pointers kept for length checks are inend and outend. inp and outp are the respective positions. The number of input bytes is checked using the expression len > (size_t)(inend - inp). The cast silences a compiler warning; inend is always larger than inp.

Example 1. Array processing in C

ssize_t
extract_strings(const char *in, size_t inlen, char **out, size_t outlen)
{
  const char *inp = in;
  const char *inend = in + inlen;
  char **outp = out;
  char **outend = out + outlen;

  while (inp != inend) {
    size_t len;
    char *s;
    if (outp == outend) {
      errno = ENOSPC;
      goto err;
    }
    len = (unsigned char)*inp;
    ++inp;
    if (len > (size_t)(inend - inp)) {
      errno = EINVAL;
      goto err;
    }
    s = malloc(len + 1);
    if (s == NULL) {
      goto err;
    }
    memcpy(s, inp, len);
    inp += len;
    s[len] = '\0';
    *outp = s;
    ++outp;
  }
  return outp - out;
err:
  {
    int errno_old = errno;
    while (out != outp) {
      free(*out);
      ++out;
    }
    errno = errno_old;
  }
  return -1;
}

It is important that the length checks always have the form len > (size_t)(inend - inp), where len is a variable of type size_t which denotes the total number of bytes which are about to be read or written next. In general, it is not safe to fold multiple such checks into one, as in len1 + len2 > (size_t)(inend - inp), because the expression on the left can overflow or wrap around (see Recommendations for Integer Arithmetic), and it no longer reflects the number of bytes to be processed.

Recommendations for Integer Arithmetic

Overflow in signed integer arithmetic is undefined. This means that it is not possible to check for overflow after it happened, see Incorrect overflow detection in C.

Example 2. Incorrect overflow detection in C

void report_overflow(void);

int
add(int a, int b)
{
  int result = a + b;
  if (a < 0 || b < 0) {
    return -1;
  }
  // The compiler can optimize away the following if statement.
  if (result < 0) {
    report_overflow();
  }
  return result;
}

The following approaches can be used to check for overflow, without actually causing it.

Use a wider type to perform the calculation, check that the result is within bounds, and convert the result to the original type. All intermediate results must be checked in this way.
Perform the calculation in the corresponding unsigned type and use bit fiddling to detect the overflow. Overflow checking for unsigned addition shows how to perform an overflow check for unsigned integer addition. For three or more terms, all the intermediate additions have to be checked in this way.

Example 3. Overflow checking for unsigned addition

void report_overflow(void);

unsigned
add_unsigned(unsigned a, unsigned b)
{
  unsigned sum = a + b;
  if (sum < a) { // or sum < b
    report_overflow();
  }
  return sum;
}

Compute bounds for acceptable input values which are known to avoid overflow, and reject other values. This is the preferred way for overflow checking on multiplications, see Overflow checking for unsigned multiplication.

Example 4. Overflow checking for unsigned multiplication

unsigned
mul(unsigned a, unsigned b)
{
  if (b && a > ((unsigned)-1) / b) {
    report_overflow();
  }
  return a * b;
}

Basic arithmetic operations are commutative, so for bounds checks, there are two different but mathematically equivalent expressions. Sometimes, one of the expressions results in better code because parts of it can be reduced to a constant. This applies to overflow checks for multiplication a * b involving a constant a, where the expression is reduced to b > C for some constant C determined at compile time. The other expression, b && a > ((unsigned)-1) / b, is more difficult to optimize at compile time.

When a value is converted to a signed integer, GCC always chooses the result based on 2’s complement arithmetic. This GCC extension (which is also implemented by other compilers) helps a lot when implementing overflow checks.

Sometimes, it is necessary to compare unsigned and signed integer variables. This results in a compiler warning, comparison between signed and unsigned integer expressions, because the comparison often gives unexpected results for negative values. When adding a cast, make sure that negative values are covered properly. If the bound is unsigned and the checked quantity is signed, you should cast the checked quantity to an unsigned type as least as wide as either operand type. As a result, negative values will fail the bounds check. (You can still check for negative values separately for clarity, and the compiler will optimize away this redundant check.)

Legacy code should be compiled with the -fwrapv GCC option. As a result, GCC will provide 2’s complement semantics for integer arithmetic, including defined behavior on integer overflow.

Global Variables

Global variables should be avoided because they usually lead to thread safety hazards. In any case, they should be declared static, so that access is restricted to a single translation unit.

Global constants are not a problem, but declaring them can be tricky. Declaring a constant array of constant strings shows how to declare a constant array of constant strings. The second const is needed to make the array constant, and not just the strings. It must be placed after the *, and not before it.

Example 5. Declaring a constant array of constant strings

static const char *const string_list[] = {
  "first",
  "second",
  "third",
  NULL
};

Sometimes, static variables local to functions are used as a replacement for proper memory management. Unlike non-static local variables, it is possible to return a pointer to static local variables to the caller. But such variables are well-hidden, but effectively global (just as static variables at file scope). It is difficult to add thread safety afterwards if such interfaces are used. Merely dropping the static keyword in such cases leads to undefined behavior.

Another source for static local variables is a desire to reduce stack space usage on embedded platforms, where the stack may span only a few hundred bytes. If this is the only reason why the static keyword is used, it can just be dropped, unless the object is very large (larger than 128 kilobytes on 32-bit platforms). In the latter case, it is recommended to allocate the object using malloc, to obtain proper array checking, for the same reasons outlined in [sect-Defensive_Coding-C-Allocators-alloca].