http://koji.fedoraproject.org/koji/taskinfo?taskID=4216955 mainloop.c: In function 'stp_main_loop': mainloop.c:581:3: error: signed and unsigned type in conditional expression [-Werror=sign-compare] mainloop.c:581:3: error: signed and unsigned type in conditional expression [-Werror=sign-compare] mainloop.c:583:35: error: signed and unsigned type in conditional expression [-Werror=sign-compare] mainloop.c:583:35: error: signed and unsigned type in conditional expression [-Werror=sign-compare] mainloop.c:631:2: error: signed and unsigned type in conditional expression [-Werror=sign-compare] mainloop.c:631:2: error: signed and unsigned type in conditional expression [-Werror=sign-compare] cc1: all warnings being treated as errors make[3]: *** [mainloop.o] Error 1
The prime suspect between our June 17 build and your scratch build is gcc 4.7.0 vs. 4.7.1. I'll double-check and fix these errors.
(In reply to comment #1) > The prime suspect between our June 17 build and your scratch build is gcc > 4.7.0 vs. 4.7.1. I'll double-check and fix these errors. Thanks josh, I suspected as much.
Hmm, actually, this may be a glibc issue, as these are in FD_SET and FD_ISSET: 581 FD_SET(control_channel, &fds); 582 res = select(control_channel + 1, NULL, NULL, &fds, &tv); 583 select_supported = (res == 1 && FD_ISSET(control_channel, &fds)); 631 FD_SET(control_channel, &fds); Note control_channel and fds are the correct types, int and fd_set. That was 2.15.90-16; the fresh glibc-2.16-1 doesn't solve it either. http://koji.fedoraproject.org/koji/buildinfo?buildID=329214 However, downgrading to the prior build's glibc-2.15.90-12 does work: http://koji.fedoraproject.org/koji/buildinfo?buildID=324754
Ok, a broken-down source: #include <sys/select.h> #include <linux/types.h> int foo(void) { fd_set fds; FD_ZERO(&fds); FD_SET(0, &fds); return FD_ISSET(0, &fds); } > $ gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps > foo.c: In function ‘foo’: > foo.c:8:162: error: signed and unsigned type in conditional expression > [-Werror=sign-compare] > foo.c:8:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare] > foo.c:9:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare] > foo.c:9:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare] > cc1: all warnings being treated as errors Including linux/types.h is not necessary for this program, but it causes the issue at hand. Reversing the #includes also removes the error. I'll attach a few .i results, then this is going to glibc.
Created attachment 596487 [details] Working result with glibc-2.15.90-12.fc18.i686
Created attachment 596488 [details] Failing result with glibc-2.16-1.fc18.i686
Created attachment 596489 [details] Working result with glibc-2.16-1.fc18.i686, headers reversed
Thanks for the testcode & .i files.. Not sure who'll ultimately own it, but there should be enough bits to sort out what's going on.
Is there some reason you're including linux/types.h? It seems to include linux/posix_types which defines an alternate FD_ELT (among other things). Unless there's a good reason, I'd suggest you avoid including linux/types.h. Meanwhile I'll open an upstream report on the problems wiht linux/posix_types.h
kernel-headers actually provides the problematical posix_types.h header file. Reassigning to kernel team. At the least to avoid this problem linux/posix_types.h should use this definition of __NFDBITS: #define __NFDBITS (8 * (int) sizeof(unsigned long)) However, one could easily argue that all the __NFD_*, and FD_* macros in posix_types.h should just be zapped in favor of those which are in glibc and which work with FORTIFY_SOURCE.
(In reply to comment #10) > kernel-headers actually provides the problematical posix_types.h header > file. Reassigning to kernel team. > > At the least to avoid this problem linux/posix_types.h should use this > definition of __NFDBITS: > > #define __NFDBITS (8 * (int) sizeof(unsigned long)) > > However, one could easily argue that all the __NFD_*, and FD_* macros in > posix_types.h should just be zapped in favor of those which are in glibc and > which work with FORTIFY_SOURCE. That file has existed unaltered in the kernel since at least the first git based kernel release, which is 2.6.12-rc2. I'm not sure why all of a sudden with glibc-2.16 it needs to change?
The original file that fails, which is systemtap-1.8/runtime/staprun/mainloop.c doesn't include posix_types.h directly at all. It simply includes <sys/select.h> which is provided by glibc. That leads me to believe something in glibc changed that was protecting userspace from this and now isn't. Particularly since glibc-2.15.90 seems to still work with the same source. Have I overlooked something?
The diff of the working and non-working preprocessed files shows this: --- /home/jwboyer/foo.i-glibc-2.15.90-12.fc18.i686 2012-07-06 16:04:44.779169152 -0400 +++ /home/jwboyer/foo.i-glibc-2.16-1.fc18.i686 2012-07-06 16:04:50.810180176 -0400 @@ -236,8 +236,8 @@ extern int pselect (int __nfds, fd_set * # 1 "/usr/include/bits/select2.h" 1 3 4 # 24 "/usr/include/bits/select2.h" 3 4 -extern unsigned long int __fdelt_chk (unsigned long int __d); -extern unsigned long int __fdelt_warn (unsigned long int __d) +extern long int __fdelt_chk (long int __d); +extern long int __fdelt_warn (long int __d) __attribute__((__warning__ ("bit outside of fd_set selected"))); # 129 "/usr/include/sys/select.h" 2 3 4 @@ -400,6 +400,6 @@ int foo(void) { fd_set fds; do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " "stosl" : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&((&fds)->__fds_bits)[0]) : "memory"); } while (0); - ((void) (((&fds)->__fds_bits)[__extension__ ({ unsigned long int __d = (0); (__builtin_constant_p (__d) ? (__d >= 1024 ? __fdelt_warn (__d) : (__d / (8 * sizeof(unsigned long)))) : __fdelt_chk (__d)); })] |= ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long)))))); - return ((((&fds)->__fds_bits)[__extension__ ({ unsigned long int __d = (0); (__builtin_constant_p (__d) ? (__d >= 1024 ? __fdelt_warn (__d) : (__d / (8 * sizeof(unsigned long)))) : __fdelt_chk (__d)); })] & ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))) != 0); + ((void) (((&fds)->__fds_bits)[__extension__ ({ long int __d = (0); (__builtin_constant_p (__d) ? (0 <= __d && __d < 1024 ? (__d / (8 * sizeof(unsigned long))) : __fdelt_warn (__d)) : __fdelt_chk (__d)); })] |= ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long)))))); + return ((((&fds)->__fds_bits)[__extension__ ({ long int __d = (0); (__builtin_constant_p (__d) ? (0 <= __d && __d < 1024 ? (__d / (8 * sizeof(unsigned long))) : __fdelt_warn (__d)) : __fdelt_chk (__d)); })] & ((__fd_mask) 1 << ((0) % (8 * sizeof(unsigned long))))) != 0); } Basically the definition of __fdelt_chk changed from unsigned long, to long. That comes from glibc commit: commit ceb9e56b3d1f8c1922e0526c2e841373843460e2 Author: Paul Pluzhnikov <ppluzhnikov> Date: Wed Jun 13 09:57:18 2012 -0700 Suppress sign-conversion warning from FD_SET. [BZ #14210] See <http://sourceware.org/ml/libc-alpha/2012-05/msg01794.html>. * debug/fdelt_chk.c (__fdelt_chk): Accept and return long int, not unsigned long int. * misc/bits/select2.h (__fdelt_chk, __fdelt_warn, __FD_ELT): Likewise. Isn't that more likely the culprit than the posix_types.h file from the kernel, considering misc/sys/select.h in glibc already seems to guard against the linux/posix_types.h definitions around line 56?
Josh, look for something including linux/types rather than posix_types. posix_types provides an alternate and incompatible definition of NFDBITS (differs from the select.h version in signedness)
(In reply to comment #14) > Josh, look for something including linux/types rather than posix_types. > The failing stap code includes "staprun.h" which includes linux/types.h. That's all wonderful, but doesn't really explain comment #15. > posix_types provides an alternate and incompatible definition of NFDBITS > (differs from the select.h version in signedness) Except as I said in comment #15, select.h from glibc already guards against that: /* Some versions of <linux/posix_types.h> define this macros. */ #undef __NFDBITS /* It's easier to assume 8-bit bytes than to get CHAR_BIT. */ #define __NFDBITS (8 * (int) sizeof (__fd_mask)) And since the stap code winds up including linux/types.h _before_ it includes sys/select.h, that #undef __NFDBITS should already prevent posix_types.h from having any particular impact, shouldn't it? The diff I put in comment #15 was the complete diff of the .i files in question (working 2.15.90 vs. failing 2.16). So I'm really confused how something that has been working basically forever, but only fails with a newer glibc is at fault here.
sigh. All "comment #15" comments above should have been "comment #13"
(In reply to comment #15) > And since the stap code winds up including linux/types.h _before_ it > includes sys/select.h, that #undef __NFDBITS should already prevent > posix_types.h from having any particular impact, shouldn't it? Indirectly, it comes much earlier, via stdlib.h:319 -> sys/types.h:219 -> sys/select.h. I'll attach a full mainloop.i. I can solve stap by including linux/types.h above all else in staprun.h, but of course it'd be better in general to resolve the conflict. > The diff I put in comment #15 was the complete diff of the .i files in > question (working 2.15.90 vs. failing 2.16). So I'm really confused how > something that has been working basically forever, but only fails with a > newer glibc is at fault here. Look also at the tail of the diff between "failing 2.16" and "working 2.16, reversed". The implementations look quite different, and so whatever sign changes glibc was attempting are apparantly disrupted.
Created attachment 596690 [details] Full mainloop.i, failing with glibc-2.16-1.fc18.i686
(In reply to comment #9) > Is there some reason you're including linux/types.h? Some historical artifact, I'm sure. At least on rawhide, it compiles fine without having that #include at all, but I'd need to check across more distros before removing it in upstream stap. Still, IMO we should not allow such a conflict between such core system headers.
(In reply to comment #17) > (In reply to comment #15) > > And since the stap code winds up including linux/types.h _before_ it > > includes sys/select.h, that #undef __NFDBITS should already prevent > > posix_types.h from having any particular impact, shouldn't it? > > Indirectly, it comes much earlier, via stdlib.h:319 -> sys/types.h:219 -> > sys/select.h. I'll attach a full mainloop.i. OK. I'm forgetting that headers are wrapped in #ifndef <header define> #endif, so if something includes sys/select.h before linux/posix_types.h gets brought in the latter definition is used. sys/select.h seems to be included by some other header before staprun.h includes linux/types.h in mainloop.i. > I can solve stap by including linux/types.h above all else in staprun.h, but > of course it'd be better in general to resolve the conflict. Yeah, that explains things in reference to my above comment. > > The diff I put in comment #15 was the complete diff of the .i files in > > question (working 2.15.90 vs. failing 2.16). So I'm really confused how > > something that has been working basically forever, but only fails with a > > newer glibc is at fault here. > > Look also at the tail of the diff between "failing 2.16" and "working 2.16, > reversed". The implementations look quite different, and so whatever sign > changes glibc was attempting are apparantly disrupted. Yeah. Even with the explanation on why the posix_types.h definition is being used figured out, I'm still not sure why something that worked up until now between both the kernel and glibc is the kernel's fault when glibc made the change.
(In reply to comment #20) > Yeah. Even with the explanation on why the posix_types.h definition is > being used figured out, I'm still not sure why something that worked up > until now between both the kernel and glibc is the kernel's fault when glibc > made the change. I don't know about fault, per se. It appears that these have always been stomping on each other, just luckily in compatible ways. It looks like this was partially separated in glibc commit e529793b: > commit e529793b508f47b5fb3a0a8bf5a1465a93a51dcd > Author: Andreas Schwab <schwab> > Date: Tue Sep 13 14:41:37 2011 +0200 > > Avoid macro clash between <sys/select.h> and <linux/posix_types.h> ... > diff --git a/misc/sys/select.h b/misc/sys/select.h > index 295d3bb..97f2b3d 100644 > --- a/misc/sys/select.h > +++ b/misc/sys/select.h > @@ -54,14 +54,12 @@ typedef __suseconds_t suseconds_t; > /* The fd_set member is required to be an array of longs. */ > typedef long int __fd_mask; > > -/* Some versions of <linux/posix_types.h> define these macros. */ > +/* Some versions of <linux/posix_types.h> define this macros. */ > #undef __NFDBITS > -#undef __FDELT > -#undef __FDMASK > /* It's easier to assume 8-bit bytes than to get CHAR_BIT. */ > #define __NFDBITS (8 * (int) sizeof (__fd_mask)) > -#define __FDELT(d) ((d) / __NFDBITS) > -#define __FDMASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) > +#define __FD_ELT(d) ((d) / __NFDBITS) > +#define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) > > /* fd_set for select and pselect. */ > typedef struct So perhaps that separation should be extended to __NFDBITS too?
(In reply to comment #21) > (In reply to comment #20) > > Yeah. Even with the explanation on why the posix_types.h definition is > > being used figured out, I'm still not sure why something that worked up > > until now between both the kernel and glibc is the kernel's fault when glibc > > made the change. "the change" does indeed seem to be the commit I hightlighted in comment #13. I built a local glibc-2.16-2 with that single commit reverted, and the example foo.c code builds fine again: [root@localhost ~]# rpm -q gcc glibc gcc-4.7.1-1.fc18.x86_64 glibc-2.16-2.fc18.x86_64 [root@localhost ~]# grep "#define __NFDBITS" /usr/include/sys/select.h #define __NFDBITS (8 * (int) sizeof (__fd_mask)) [root@localhost ~]# grep "#define __NFDBITS" /usr/include/linux/posix_types.h #define __NFDBITS (8 * sizeof(unsigned long)) [root@localhost ~]# cat foo.c #include <sys/select.h> #include <linux/types.h> int foo(void) { fd_set fds; FD_ZERO(&fds); FD_SET(0, &fds); return FD_ISSET(0, &fds); } [root@localhost ~]# gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps foo.c: In function ‘foo’: foo.c:8:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare] foo.c:8:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare] foo.c:9:162: error: signed and unsigned type in conditional expression [-Werror=sign-compare] foo.c:9:184: error: signed and unsigned type in conditional expression [-Werror=sign-compare] cc1: all warnings being treated as errors [root@localhost ~]# rpm -Uvh --oldpackage ./glibc-common-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-headers-2.16-2.bz837641.1.fc18.x86_64.rpm glibc-devel-2.16-2.bz837641.1.fc18.x86_64.rpm Preparing... ########################################### [100%] 1:glibc ########################################### [ 25%] 2:glibc-common ########################################### [ 50%] 3:glibc-headers ########################################### [ 75%] 4:glibc-devel ########################################### [100%] [root@localhost ~]# gcc -Wextra -Werror -O2 -D_FORTIFY_SOURCE=2 -c foo.c -save-temps [root@localhost ~]# grep "#define __NFDBITS" /usr/include/sys/select.h #define __NFDBITS (8 * (int) sizeof (__fd_mask)) [root@localhost ~]# grep "#define __NFDBITS" /usr/include/linux/posix_types.h #define __NFDBITS (8 * sizeof(unsigned long)) [root@localhost ~]# > I don't know about fault, per se. It appears that these have always been > stomping on each other, just luckily in compatible ways. Right, "fault" has perhaps the wrong connotations to it. I just meant "component that needs to fix things". > It looks like this was partially separated in glibc commit e529793b: > > > commit e529793b508f47b5fb3a0a8bf5a1465a93a51dcd > > Author: Andreas Schwab <schwab> > > Date: Tue Sep 13 14:41:37 2011 +0200 > > > > Avoid macro clash between <sys/select.h> and <linux/posix_types.h> <snip> > So perhaps that separation should be extended to __NFDBITS too? Yeah, that might be the best all around solution. Adding an (int) cast to the definition in posix_types.h seems trivial, however given that it has been the way it is for so long I have no idea if anyone is relying on that definition or not. We'd likely need much stronger reasoning than "glibc changed somehow and this breaks", particularly when there are systems that use Linux but do not use glibc (Android being the most notable one). I'm really not trying to be difficult, but changing the kernel header based on the information and symptoms in this bug seems tenuous thus far.
I patched the linux/types.h include out of staprun.h; scratch build here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4227867 With that and the simplified test in comment #4, SystemTap doesn't need to be involved anymore, so I'm changing the subject to the root __NFDBITS conflict.
Any update on the resolution of this?
We're going to move this back to glibc in light of comment #22. If the kernel needs to be changed, it needs to be changed upstream and not carried as a patch in Fedora.
No, this is not a glibc problem and if I try to get it fixed upstream it's just going to get rejected and I'm not going to carry it as a fedora-local change as doing so will just trade one set of warnings for another. Please fix the kernel headers. This is not a glibc problem. Jeff
(In reply to comment #26) > No, this is not a glibc problem and if I try to get it fixed upstream it's > just going to get rejected and I'm not going to carry it as a fedora-local > change as doing so will just trade one set of warnings for another. > > Please fix the kernel headers. This is not a glibc problem. Providing us with a valid reasoning on why the kernel headers are now incorrect (or always have been) would go a long way towards getting that accomplished.
Jeff, rather than us playing middle-man, you going directly to Linux-kernel with this is going to be the fastest way to get this resolved.
*** Bug 840902 has been marked as a duplicate of this bug. ***
FYI, as in bug #840902 comment #6, it appears harder to avoid including linux/posix_types.h on some archs. On PPC you can get there by: /usr/include/signal.h:338 -> /usr/include/bits/sigcontext.h:27 -> /usr/include/asm/sigcontext.h:11 -> /usr/include/asm/ptrace.h:27 -> /usr/include/linux/types.h:8 -> /usr/include/linux/posix_types.h The first two are glibc, and the rest are kernel-headers.
*** Bug 841571 has been marked as a duplicate of this bug. ***
http://article.gmane.org/gmane.linux.kernel/1332500
(In reply to comment #32) > http://article.gmane.org/gmane.linux.kernel/1332500 That one fixes the issue, but Linus wanted to go further and just remove the macro(s) entirely. Resulted in: http://article.gmane.org/gmane.linux.kernel/1333023
The patch went upstream as commit 8ded2bb. It should be in rawhide tomorrow with kernel-3.6.0-0.rc0.git4.1.fc18