837641 – __NFDBITS conflict between sys/select.h and linux/posix_types.h

Bug 837641 - __NFDBITS conflict between sys/select.h and linux/posix_types.h

Summary: __NFDBITS conflict between sys/select.h and linux/posix_types.h

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	841571 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-07-04 14:24 UTC by Peter Robinson
Modified:	2012-07-31 16:09 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-07-31 16:09:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Working result with glibc-2.15.90-12.fc18.i686 (9.05 KB, text/plain) 2012-07-05 19:47 UTC, Josh Stone	no flags	Details
Failing result with glibc-2.16-1.fc18.i686 (9.02 KB, text/plain) 2012-07-05 19:48 UTC, Josh Stone	no flags	Details
Working result with glibc-2.16-1.fc18.i686, headers reversed (8.52 KB, text/plain) 2012-07-05 19:49 UTC, Josh Stone	no flags	Details
Full mainloop.i, failing with glibc-2.16-1.fc18.i686 (278.26 KB, text/plain) 2012-07-06 22:00 UTC, Josh Stone	no flags	Details
View All

Description Peter Robinson 2012-07-04 14:24:13 UTC

http://koji.fedoraproject.org/koji/taskinfo?taskID=4216955

mainloop.c: In function 'stp_main_loop':
mainloop.c:581:3: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
mainloop.c:581:3: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
mainloop.c:583:35: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
mainloop.c:583:35: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
mainloop.c:631:2: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
mainloop.c:631:2: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
cc1: all warnings being treated as errors
make[3]: *** [mainloop.o] Error 1

Comment 1 Josh Stone 2012-07-05 17:43:55 UTC

The prime suspect between our June 17 build and your scratch build is gcc 4.7.0 vs. 4.7.1.  I'll double-check and fix these errors.

Comment 2 Peter Robinson 2012-07-05 17:49:39 UTC

(In reply to comment #1)
> The prime suspect between our June 17 build and your scratch build is gcc
> 4.7.0 vs. 4.7.1.  I'll double-check and fix these errors.

Thanks josh, I suspected as much.

Comment 3 Josh Stone 2012-07-05 18:36:07 UTC

Hmm, actually, this may be a glibc issue, as these are in FD_SET and FD_ISSET:

581   FD_SET(control_channel, &fds);
582   res = select(control_channel + 1, NULL, NULL, &fds, &tv);
583   select_supported = (res == 1 && FD_ISSET(control_channel, &fds));

631 	FD_SET(control_channel, &fds);

Note control_channel and fds are the correct types, int and fd_set.

That was 2.15.90-16; the fresh glibc-2.16-1 doesn't solve it either.
http://koji.fedoraproject.org/koji/buildinfo?buildID=329214

However, downgrading to the prior build's glibc-2.15.90-12 does work:
http://koji.fedoraproject.org/koji/buildinfo?buildID=324754

Comment 4 Josh Stone 2012-07-05 19:46:06 UTC

Ok, a broken-down source:

#include <sys/select.h>
#include <linux/types.h>

int foo(void)
{
  fd_set fds;
  FD_ZERO(&fds);
  FD_SET(0, &fds);
  return FD_ISSET(0, &fds);
}

> $ gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps
> foo.c: In function ‘foo’:
> foo.c:8:162: error: signed and unsigned type in conditional expression > [-Werror=sign-compare]
> foo.c:8:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
> foo.c:9:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
> foo.c:9:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
> cc1: all warnings being treated as errors

Including linux/types.h is not necessary for this program, but it causes the issue at hand.  Reversing the #includes also removes the error.

I'll attach a few .i results, then this is going to glibc.

Comment 5 Josh Stone 2012-07-05 19:47:58 UTC

Created attachment 596487 [details]
Working result with glibc-2.15.90-12.fc18.i686

Comment 6 Josh Stone 2012-07-05 19:48:42 UTC

Created attachment 596488 [details]
Failing result with glibc-2.16-1.fc18.i686

Comment 7 Josh Stone 2012-07-05 19:49:48 UTC

Created attachment 596489 [details]
Working result with glibc-2.16-1.fc18.i686, headers reversed

Comment 8 Jeff Law 2012-07-06 05:42:32 UTC

Thanks for the testcode & .i files..  Not sure who'll ultimately own it, but there should be enough bits to sort out what's going on.

Comment 9 Jeff Law 2012-07-06 18:06:36 UTC

Is there some reason you're including linux/types.h?

It seems to include linux/posix_types which defines an alternate FD_ELT (among other things).

Unless there's a good reason, I'd suggest you avoid including linux/types.h.  Meanwhile I'll open an upstream report on the problems wiht linux/posix_types.h

Comment 10 Jeff Law 2012-07-06 18:20:31 UTC

kernel-headers actually provides the problematical posix_types.h header file.  Reassigning to kernel team.

At the least to avoid this problem linux/posix_types.h should use this definition of __NFDBITS:

#define __NFDBITS       (8 * (int) sizeof(unsigned long))

However, one could easily argue that all the __NFD_*, and FD_* macros in posix_types.h should just be zapped in favor of those which are in glibc and which work with FORTIFY_SOURCE.

Comment 11 Josh Boyer 2012-07-06 19:40:33 UTC

(In reply to comment #10)
> kernel-headers actually provides the problematical posix_types.h header
> file.  Reassigning to kernel team.
> 
> At the least to avoid this problem linux/posix_types.h should use this
> definition of __NFDBITS:
> 
> #define __NFDBITS       (8 * (int) sizeof(unsigned long))
> 
> However, one could easily argue that all the __NFD_*, and FD_* macros in
> posix_types.h should just be zapped in favor of those which are in glibc and
> which work with FORTIFY_SOURCE.

That file has existed unaltered in the kernel since at least the first git based kernel release, which is 2.6.12-rc2.  I'm not sure why all of a sudden with glibc-2.16 it needs to change?

Comment 12 Josh Boyer 2012-07-06 19:53:15 UTC

The original file that fails, which is systemtap-1.8/runtime/staprun/mainloop.c doesn't include posix_types.h directly at all.  It simply includes <sys/select.h> which is provided by glibc.  That leads me to believe something in glibc changed that was protecting userspace from this and now isn't.  Particularly since glibc-2.15.90 seems to still work with the same source.

Have I overlooked something?

Comment 13 Josh Boyer 2012-07-06 20:09:26 UTC

The diff of the working and non-working preprocessed files shows this:

--- /home/jwboyer/foo.i-glibc-2.15.90-12.fc18.i686	2012-07-06 16:04:44.779169152 -0400
+++ /home/jwboyer/foo.i-glibc-2.16-1.fc18.i686	2012-07-06 16:04:50.810180176 -0400
@@ -236,8 +236,8 @@ extern int pselect (int __nfds, fd_set *
 
 # 1 "/usr/include/bits/select2.h" 1 3 4
 # 24 "/usr/include/bits/select2.h" 3 4
-extern unsigned long int __fdelt_chk (unsigned long int __d);
-extern unsigned long int __fdelt_warn (unsigned long int __d)
+extern long int __fdelt_chk (long int __d);
+extern long int __fdelt_warn (long int __d)
   __attribute__((__warning__ ("bit outside of fd_set selected")));
 # 129 "/usr/include/sys/select.h" 2 3 4
 
@@ -400,6 +400,6 @@ int foo(void)
 {
   fd_set fds;
   do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " "stosl" : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&((&fds)->__fds_bits)[0]) : "memory"); } while (0);
-  ((void) (((&fds)->__fds_bits)[__extension__ ({ unsigned long int __d = (0); (__builtin_constant_p (__d) ? (__d >= 1024 ? __fdelt_warn (__d) : (__d / (8 * sizeof(unsigned long)))) : __fdelt_chk (__d)); })] |= ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))));
-  return ((((&fds)->__fds_bits)[__extension__ ({ unsigned long int __d = (0); (__builtin_constant_p (__d) ? (__d >= 1024 ? __fdelt_warn (__d) : (__d / (8 * sizeof(unsigned long)))) : __fdelt_chk (__d)); })] & ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))) != 0);
+  ((void) (((&fds)->__fds_bits)[__extension__ ({ long int __d = (0); (__builtin_constant_p (__d) ? (0 <= __d && __d < 1024 ? (__d / (8 * sizeof(unsigned long))) : __fdelt_warn (__d)) : __fdelt_chk (__d)); })] |= ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))));
+  return ((((&fds)->__fds_bits)[__extension__ ({ long int __d = (0); (__builtin_constant_p (__d) ? (0 <= __d && __d < 1024 ? (__d / (8 * sizeof(unsigned long))) : __fdelt_warn (__d)) : __fdelt_chk (__d)); })] & ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))) != 0);
 }

Basically the definition of __fdelt_chk changed from unsigned long, to long.  That comes from glibc commit:

commit ceb9e56b3d1f8c1922e0526c2e841373843460e2
Author: Paul Pluzhnikov <ppluzhnikov>
Date:   Wed Jun 13 09:57:18 2012 -0700

    Suppress sign-conversion warning from FD_SET.
    
    [BZ #14210] See <http://sourceware.org/ml/libc-alpha/2012-05/msg01794.html>.
    * debug/fdelt_chk.c (__fdelt_chk): Accept and return long int,
    not unsigned long int.
    * misc/bits/select2.h (__fdelt_chk, __fdelt_warn, __FD_ELT): Likewise.



Isn't that more likely the culprit than the posix_types.h file from the kernel, considering misc/sys/select.h in glibc already seems to guard against the linux/posix_types.h definitions around line 56?

Comment 14 Jeff Law 2012-07-06 20:44:28 UTC

Josh, look for something including linux/types rather than posix_types.


posix_types provides an alternate and incompatible definition of NFDBITS (differs from the select.h version in signedness)

Comment 15 Josh Boyer 2012-07-06 21:10:44 UTC

(In reply to comment #14)
> Josh, look for something including linux/types rather than posix_types.
> 

The failing stap code includes "staprun.h" which includes linux/types.h.  That's all wonderful, but doesn't really explain comment #15.

> posix_types provides an alternate and incompatible definition of NFDBITS
> (differs from the select.h version in signedness)

Except as I said in comment #15, select.h from glibc already guards against that:

/* Some versions of <linux/posix_types.h> define this macros.  */
#undef  __NFDBITS
/* It's easier to assume 8-bit bytes than to get CHAR_BIT.  */
#define __NFDBITS       (8 * (int) sizeof (__fd_mask))

And since the stap code winds up including linux/types.h _before_ it includes sys/select.h, that #undef __NFDBITS should already prevent posix_types.h from having any particular impact, shouldn't it?

The diff I put in comment #15 was the complete diff of the .i files in question (working 2.15.90 vs. failing 2.16).  So I'm really confused how something that has been working basically forever, but only fails with a newer glibc is at fault here.

Comment 16 Josh Boyer 2012-07-06 21:19:22 UTC

sigh.  All "comment #15" comments above should have been "comment #13"

Comment 17 Josh Stone 2012-07-06 21:57:31 UTC

(In reply to comment #15)
> And since the stap code winds up including linux/types.h _before_ it
> includes sys/select.h, that #undef __NFDBITS should already prevent
> posix_types.h from having any particular impact, shouldn't it?

Indirectly, it comes much earlier, via stdlib.h:319 -> sys/types.h:219 -> sys/select.h.  I'll attach a full mainloop.i.

I can solve stap by including linux/types.h above all else in staprun.h, but of course it'd be better in general to resolve the conflict.

> The diff I put in comment #15 was the complete diff of the .i files in
> question (working 2.15.90 vs. failing 2.16).  So I'm really confused how
> something that has been working basically forever, but only fails with a
> newer glibc is at fault here.

Look also at the tail of the diff between "failing 2.16" and "working 2.16, reversed".  The implementations look quite different, and so whatever sign changes glibc was attempting are apparantly disrupted.

Comment 18 Josh Stone 2012-07-06 22:00:52 UTC

Created attachment 596690 [details]
Full mainloop.i, failing with glibc-2.16-1.fc18.i686

Comment 19 Josh Stone 2012-07-06 22:13:02 UTC

(In reply to comment #9)
> Is there some reason you're including linux/types.h?

Some historical artifact, I'm sure.  At least on rawhide, it compiles fine without having that #include at all, but I'd need to check across more distros before removing it in upstream stap.

Still, IMO we should not allow such a conflict between such core system headers.

Comment 20 Josh Boyer 2012-07-06 22:24:05 UTC

(In reply to comment #17)
> (In reply to comment #15)
> > And since the stap code winds up including linux/types.h _before_ it
> > includes sys/select.h, that #undef __NFDBITS should already prevent
> > posix_types.h from having any particular impact, shouldn't it?
> 
> Indirectly, it comes much earlier, via stdlib.h:319 -> sys/types.h:219 ->
> sys/select.h.  I'll attach a full mainloop.i.

OK.  I'm forgetting that headers are wrapped in #ifndef <header define> #endif, so if something includes sys/select.h before linux/posix_types.h gets brought in the latter definition is used.  sys/select.h seems to be included by some other header before staprun.h includes linux/types.h in mainloop.i.

> I can solve stap by including linux/types.h above all else in staprun.h, but
> of course it'd be better in general to resolve the conflict.

Yeah, that explains things in reference to my above comment.

> > The diff I put in comment #15 was the complete diff of the .i files in
> > question (working 2.15.90 vs. failing 2.16).  So I'm really confused how
> > something that has been working basically forever, but only fails with a
> > newer glibc is at fault here.
> 
> Look also at the tail of the diff between "failing 2.16" and "working 2.16,
> reversed".  The implementations look quite different, and so whatever sign
> changes glibc was attempting are apparantly disrupted.

Yeah.  Even with the explanation on why the posix_types.h definition is being used figured out, I'm still not sure why something that worked up until now between both the kernel and glibc is the kernel's fault when glibc made the change.

Comment 21 Josh Stone 2012-07-06 22:39:37 UTC

(In reply to comment #20)
> Yeah.  Even with the explanation on why the posix_types.h definition is
> being used figured out, I'm still not sure why something that worked up
> until now between both the kernel and glibc is the kernel's fault when glibc
> made the change.

I don't know about fault, per se.  It appears that these have always been stomping on each other, just luckily in compatible ways.

It looks like this was partially separated in glibc commit e529793b:

> commit e529793b508f47b5fb3a0a8bf5a1465a93a51dcd
> Author: Andreas Schwab <schwab>
> Date:   Tue Sep 13 14:41:37 2011 +0200
> 
>     Avoid macro clash between <sys/select.h> and <linux/posix_types.h>
...
> diff --git a/misc/sys/select.h b/misc/sys/select.h
> index 295d3bb..97f2b3d 100644
> --- a/misc/sys/select.h
> +++ b/misc/sys/select.h
> @@ -54,14 +54,12 @@ typedef __suseconds_t suseconds_t;
>  /* The fd_set member is required to be an array of longs.  */
>  typedef long int __fd_mask;
> 
> -/* Some versions of <linux/posix_types.h> define these macros.  */
> +/* Some versions of <linux/posix_types.h> define this macros.  */
>  #undef __NFDBITS
> -#undef __FDELT
> -#undef __FDMASK
>  /* It's easier to assume 8-bit bytes than to get CHAR_BIT.  */
>  #define __NFDBITS      (8 * (int) sizeof (__fd_mask))
> -#define        __FDELT(d)      ((d) / __NFDBITS)
> -#define        __FDMASK(d)     ((__fd_mask) 1 << ((d) % __NFDBITS))
> +#define        __FD_ELT(d)     ((d) / __NFDBITS)
> +#define        __FD_MASK(d)    ((__fd_mask) 1 << ((d) % __NFDBITS))
> 
>  /* fd_set for select and pselect.  */
>  typedef struct

So perhaps that separation should be extended to __NFDBITS too?

Comment 22 Josh Boyer 2012-07-06 22:57:15 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > Yeah.  Even with the explanation on why the posix_types.h definition is
> > being used figured out, I'm still not sure why something that worked up
> > until now between both the kernel and glibc is the kernel's fault when glibc
> > made the change.

"the change" does indeed seem to be the commit I hightlighted in comment #13.  I built a local glibc-2.16-2 with that single commit reverted, and the example foo.c code builds fine again:

[root@localhost ~]# rpm -q gcc glibc
gcc-4.7.1-1.fc18.x86_64
glibc-2.16-2.fc18.x86_64
[root@localhost ~]# grep "#define __NFDBITS" /usr/include/sys/select.h 
#define __NFDBITS	(8 * (int) sizeof (__fd_mask))
[root@localhost ~]# grep "#define __NFDBITS" /usr/include/linux/posix_types.h 
#define __NFDBITS	(8 * sizeof(unsigned long))
[root@localhost ~]# cat foo.c 
#include <sys/select.h>
#include <linux/types.h>

int foo(void)
{
  fd_set fds;
  FD_ZERO(&fds);
  FD_SET(0, &fds);
  return FD_ISSET(0, &fds);
}

[root@localhost ~]# gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps
foo.c: In function ‘foo’:
foo.c:8:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
foo.c:8:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
foo.c:9:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
foo.c:9:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare]
cc1: all warnings being treated as errors
[root@localhost ~]# rpm -Uvh --oldpackage ./glibc-common-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-headers-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-devel-2.16-2.bz837641.1.fc18.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:glibc                  ########################################### [ 25%]
   2:glibc-common           ########################################### [ 50%]
   3:glibc-headers          ########################################### [ 75%]
   4:glibc-devel            ########################################### [100%]
[root@localhost ~]# gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps
[root@localhost ~]# grep "#define __NFDBITS" /usr/include/sys/select.h 
#define __NFDBITS	(8 * (int) sizeof (__fd_mask))
[root@localhost ~]# grep "#define __NFDBITS" /usr/include/linux/posix_types.h #define __NFDBITS	(8 * sizeof(unsigned long))
[root@localhost ~]# 


> I don't know about fault, per se.  It appears that these have always been
> stomping on each other, just luckily in compatible ways.

Right, "fault" has perhaps the wrong connotations to it.  I just meant "component that needs to fix things".

> It looks like this was partially separated in glibc commit e529793b:
> 
> > commit e529793b508f47b5fb3a0a8bf5a1465a93a51dcd
> > Author: Andreas Schwab <schwab>
> > Date:   Tue Sep 13 14:41:37 2011 +0200
> > 
> >     Avoid macro clash between <sys/select.h> and <linux/posix_types.h>

<snip>

> So perhaps that separation should be extended to __NFDBITS too?

Yeah, that might be the best all around solution.  Adding an (int) cast to the definition in posix_types.h seems trivial, however given that it has been the way it is for so long I have no idea if anyone is relying on that definition or not.  We'd likely need much stronger reasoning than "glibc changed somehow and this breaks", particularly when there are systems that use Linux but do not use glibc (Android being the most notable one).

I'm really not trying to be difficult, but changing the kernel header based on the information and symptoms in this bug seems tenuous thus far.

Comment 23 Josh Stone 2012-07-09 20:04:45 UTC

I patched the linux/types.h include out of staprun.h; scratch build here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=4227867

With that and the simplified test in comment #4, SystemTap doesn't need to be involved anymore, so I'm changing the subject to the root __NFDBITS conflict.

Comment 24 Peter Robinson 2012-07-12 09:18:49 UTC

Any update on the resolution of this?

Comment 25 Josh Boyer 2012-07-16 15:10:47 UTC

We're going to move this back to glibc in light of comment #22.  If the kernel needs to be changed, it needs to be changed upstream and not carried as a patch in Fedora.

Comment 26 Jeff Law 2012-07-16 15:29:21 UTC

No, this is not a glibc problem and if I try to get it fixed upstream it's just going to get rejected and I'm not going to carry it as a fedora-local change as doing so will just trade one set of warnings for another.

Please fix the kernel headers.  This is not a glibc problem.



Jeff

Comment 27 Josh Boyer 2012-07-16 15:45:28 UTC

(In reply to comment #26)
> No, this is not a glibc problem and if I try to get it fixed upstream it's
> just going to get rejected and I'm not going to carry it as a fedora-local
> change as doing so will just trade one set of warnings for another.
> 
> Please fix the kernel headers.  This is not a glibc problem.

Providing us with a valid reasoning on why the kernel headers are now incorrect (or always have been) would go a long way towards getting that accomplished.

Comment 28 Dave Jones 2012-07-16 15:49:21 UTC

Jeff, rather than us playing middle-man, you going directly to Linux-kernel with this is going to be the fastest way to get this resolved.

Comment 29 Karsten Hopp 2012-07-17 14:43:42 UTC

*** Bug 840902 has been marked as a duplicate of this bug. ***

Comment 30 Josh Stone 2012-07-17 17:39:50 UTC

FYI, as in bug #840902 comment #6, it appears harder to avoid including linux/posix_types.h on some archs.  On PPC you can get there by:

  /usr/include/signal.h:338
  -> /usr/include/bits/sigcontext.h:27
  -> /usr/include/asm/sigcontext.h:11
  -> /usr/include/asm/ptrace.h:27
  -> /usr/include/linux/types.h:8
  -> /usr/include/linux/posix_types.h

The first two are glibc, and the rest are kernel-headers.

Comment 31 Josh Boyer 2012-07-24 18:12:38 UTC

*** Bug 841571 has been marked as a duplicate of this bug. ***

Comment 32 Josh Boyer 2012-07-24 18:14:24 UTC

http://article.gmane.org/gmane.linux.kernel/1332500

Comment 33 Josh Boyer 2012-07-25 19:24:22 UTC

(In reply to comment #32)
> http://article.gmane.org/gmane.linux.kernel/1332500

That one fixes the issue, but Linus wanted to go further and just remove the macro(s) entirely.  Resulted in:

http://article.gmane.org/gmane.linux.kernel/1333023

Comment 34 Josh Boyer 2012-07-30 21:13:04 UTC

The patch went upstream as commit 8ded2bb.  It should be in rawhide tomorrow with kernel-3.6.0-0.rc0.git4.1.fc18

Note You need to log in before you can comment on or make changes to this bug.