Bug 1041336

Summary:	attribute((nonnull)) is worthless, but shouldn't be
Product:	Red Hat Enterprise Linux 7	Reporter:	Eric Blake <eblake>
Component:	gcc	Assignee:	Jakub Jelinek <jakub>
Status:	CLOSED UPSTREAM	QA Contact:	qe-baseos-tools-bugs
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	armbru, eblake, law, mpolacek, rjones
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-13 17:36:01 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Blake 2013-12-12 15:12:38 UTC

Description of problem:
See this upstream gcc bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
Basically, gcc's implementation of attribute((nonnull)) is so useless that at least the libvirt project has taken the stance of avoiding the attribute when compiling under gcc (it causes more bugs than it cures) and only using the attribute when compiling under clang or Coverity (where the static checking is actually enhanced by the annotation).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Compile code that uses attribute((nonnull))
2.
3.

Actual results:
Observe that gcc uses the attribute to optimize out null checks in the function body, but does not emit any warnings when callers pass provably null arguments to those functions.  This makes programs silently misbehave in hard-to-diagnose manners. Without a way to enforce that callers are using sane semantics, optimizing the callee makes bugs in the callers much harder to decipher.

Expected results:
gcc should loudly complain if any caller attempts to pass a provably NULL argument to a parameter marked nonnull.

Additional info:

Comment 2 Jakub Jelinek 2013-12-12 19:33:45 UTC

???

__attribute__((nonnull)) int *
foo (int *p)
{
  return p;
}

int *
bar (void)
{
  return foo ((int *) 0);
}

gcc -S -O2 -Wall rh1041336.c
rh1041336.c: In function ‘bar’:
rh1041336.c:10:3: warning: null argument where non-null required (argument 1) [-Wnonnull]
   return foo ((int *) 0);
   ^

Comment 3 Eric Blake 2013-12-12 19:52:06 UTC

(In reply to Jakub Jelinek from comment #2)
> int *
> bar (void)
> {
>   return foo ((int *) 0);

Nobody codes bugs that blatantly.  Here's the real problem:

$ cat foo.c
__attribute__((nonnull)) int *
foo (int *p)
{
  return p;
}

int *
bar (void)
{
  int *oops = 0;
  return foo (oops);
}
$ gcc -S -O2 -Wall foo.c
$ echo $?
0

It's trivial for a human to see that 'oops' is a NULL pointer, but gcc refuses to warn.  Coverity and clang catch it, gcc should too.

Comment 4 Eric Blake 2013-12-12 19:59:52 UTC

Where the warning would REALLY be helpful is in diagnosing cases where it is not as easy for the human to see a potential NULL deref, such as:

__attribute__((nonnull)) int *
foo (int *p)
{
  return p;
}

int *
bar (int *blah)
{
  int *other = foo (blah);
  if (!blah)
    abort()
  return other;
}

Coverity also catches this one under its FORWARD_NULL rule - the fact that we have code later in the function bar() that is conditional on whether 'blah' is NULL or not means that all earlier uses of 'blah' must be assumed to potentially have the value NULL, but passing NULL to foo() is a violation of the nonnull attribute, so either the 'if (!blah)' block is dead code or the 'foo(blah)' block is a bug; either way, a warning is warranted.  Where Coverity shines is when the provable potential for NULL is not immediately adjacent to the use of NULL; but the point remains that anywhere the compiler can prove an argument might be NULL, it should warn that the program may be violating the attribute(nonnull) on the function it is calling.  I'm okay if Coverity can prove more instances of NULL than gcc, but what I'm not okay with is how easy it is to write blatant NULL derefs that gcc won't flag.

Comment 5 Markus Armbruster 2013-12-13 07:46:51 UTC

Slightly tangential, but it bugs me too much to keep it to myself: gcc
is able and eager to recognize and *exploit* null-ness, but at the
same time fails to tell me about it.  Recently covered again in
LWN[*]:

    The second example is a null pointer dereference in the Linux
    kernel:

	struct tun_struct *tun = ...;
	struct sock *sk = tun->sk;
	if (!tun)
	    return POLLERR;
	/* write to address based on tun */

    Normally that code would cause a kernel oops if tun is null, but
    if page zero is mapped for some reason, the code is basically
    harmless --- as long as the test remains.  Because the compiler
    sees the dereference operation, it can conclude that the pointer
    is always non-null and remove the test entirely, which turns a
    fairly innocuous bug into a potential kernel exploit.

[*] http://lwn.net/Articles/575563/

Comment 6 Jakub Jelinek 2013-12-13 08:00:04 UTC

(In reply to Markus Armbruster from comment #5)
> Slightly tangential, but it bugs me too much to keep it to myself: gcc
> is able and eager to recognize and *exploit* null-ness, but at the
> same time fails to tell me about it.  Recently covered again in
> LWN[*]:
> 
>     The second example is a null pointer dereference in the Linux
>     kernel:
> 
> 	struct tun_struct *tun = ...;
> 	struct sock *sk = tun->sk;
> 	if (!tun)
> 	    return POLLERR;
> 	/* write to address based on tun */
> 
>     Normally that code would cause a kernel oops if tun is null, but
>     if page zero is mapped for some reason, the code is basically
>     harmless --- as long as the test remains.  Because the compiler
>     sees the dereference operation, it can conclude that the pointer
>     is always non-null and remove the test entirely, which turns a
>     fairly innocuous bug into a potential kernel exploit.
> 
> [*] http://lwn.net/Articles/575563/

This is very hard to warn about, the way GCC optimizes this is that the VRP pass, after seeing the pointer dereference, notes in the range info for the SSA_NAME that the range doesn't include 0, and much later the test is actually optimized, but there is no easy way to store why the range was the way it was,
ranges even for pointers can be influenced by tests (assertion on the test branches), arithmetics on the pointers etc., ranges can be copied between pointers and merged (union) on PHIs, etc.  Warning at the point where we add the assertion that the pointer is non-NULL after pointer dereference would warn pretty much on all code, you'd want to warn only about when you actually optimize something based on it, but as I said, it is close to impossible to track the reason for why the range has been computed that way.  Not to mention that in GCC 4.9, the ranges are preserved through optimization passes, so while in <= 4.8 only the VRP pass could optimize based on the range info, in 4.9+ any following pass can optimize based on it too.

Comment 7 Jeff Law 2013-12-13 17:36:01 UTC

Jakub's responses are dead-on correct.

Also note that more modern versions of GCC (4.9 and beyond) will use information gleamed from the non-null and returns-nonnull attribute to do even more optimizations.  Specifically paths where the constraints implied by those attributes are violated will be isolated and traps inserted.

For that specific transformation (isolation and trapping) I expect there will be optional warnings.

See upstream #16351 (link in external trackers).

Comment 8 Marek Polacek 2013-12-13 17:42:10 UTC

Note that we might implement runtime checks for this, that would be a part of -fsanitize=undefined.  But this has first to be done in the LLVM repo.  Supposedly it be done for 4.9.

Comment 9 Jakub Jelinek 2013-12-17 15:26:49 UTC

Right now the warning is in the frontends.  Warning later on for this has tons of issues, sometimes compiler threads jumps or performs similar optimizations where it will copy and simplify some portions of code that you could end up with foo (0) calls in the code, the compiler couldn't prove those paths are unreachable and if the warning would be late, it would warn, when there is nothing wrong with the original code, there would be no way to avoid that.
Also, with some programming techniques, say using macros or templates heavily, one could end up with tons of false positives.

What would actually IMHO make sense instead is for -fsanitize=undefined add code to report misuses of the nonnull attribute (and returns_nonnull attribute) at runtime.  This is new thing in GCC 4.9 though, and right now the libubsan library we are using doesn't have warning code for this and the library unfortunately has quite problematic upstream in llvm, so it might be difficult to get it accepted there, but we can at least try.  -fsanitize=undefined already has warnings about __builtin_unreachable () being reached and similar other problems (signed integer overflow, etc.).

Comment 10 Markus Armbruster 2013-12-17 16:35:32 UTC

I'm all for giving optimizers license (I used to work on one), but
when they outrun the toolchain's ability to catch programming errors,
it can only end in tears.

Static checks are best, of course.  This is what Eric asked for.

But run-time checks would be so much better than nothing.  In
particular when they change "optimizer detected your program violates
some arcane stipulation of the standard, and turned it into security
hole for you (but a really fast one, and besides, it's your own
fault!)"  into "optimizer detected... and turned it into something
that safely crashes".