Description of problem: While trying to get a third-part 32 bit program to run, I hit problems with setuid32. strace of the binary on an Ubuntu system reported: [f7727425] fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 [f7727425] setuid32(0) = 0 [f7727425] time(NULL) = 1405561829 while on my Rawhide box it dies mysteriously: [f7fd5aa0] fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 [f7fd5aa0] syscall_4157941504(0, 0xffffd628, 0xf7feeaa0, 0xf7d52700, 0xf7f94000, 0xf7f961c4) = -1 (errno 38) Testing shows that it does 'setuid32(0) = 0' with 2.19.90-28 on the Rawhide box as well, and the binary functions just fine. Where it goes astray, according to GDB (cutting-and-pasting pieces) (gdb) disassemble __setuid,+64 Dump of assembler code from 0xf7e11f60 to 0xf7e11fa0: 0xf7e11f60 <__setuid+0>: push %ebx 0xf7e11f61 <__setuid+1>: call 0xf7e9080f <__x86.get_pc_thunk.bx> 0xf7e11f66 <__setuid+6>: add $0x11509a,%ebx 0xf7e11f6c <__setuid+12>: sub $0x28,%esp 0xf7e11f6f <__setuid+15>: mov 0x3d18(%ebx),%eax 0xf7e11f75 <__setuid+21>: mov 0x30(%esp),%edx => 0xf7e11f79 <__setuid+25>: test %eax,%eax 0xf7e11f7b <__setuid+27>: jne 0xf7e11fa0 <__setuid+64> 0xf7e11f7d <__setuid+29>: xchg %edx,%ebx 0xf7e11f7f <__setuid+31>: mov $0xd5,%eax 0xf7e11f84 <__setuid+36>: call *%gs:0x10 0xf7e11f8b <__setuid+43>: xchg %edx,%ebx 0xf7e11f8d <__setuid+45>: cmp $0xfffff000,%eax 0xf7e11f92 <__setuid+50>: ja 0xf7e11fd0 <__setuid+112> 0xf7e11f94 <__setuid+52>: add $0x28,%esp 0xf7e11f97 <__setuid+55>: pop %ebx 0xf7e11f98 <__setuid+56>: ret 0xf7e11f99 <__setuid+57>: lea 0x0(%esi,%eiz,1),%esi End of assembler dump. (gdb) stepi 0xf7e11f7b 29 result = INLINE_SETXID_SYSCALL (setuid32, 1, uid); (gdb) info registers eax 0x1 1 ecx 0x53c6e214 1405542932 edx 0x0 0 ebx 0xf7f27000 -135106560 esp 0xffffd4f0 0xffffd4f0 ebp 0xffffd638 0xffffd638 esi 0x0 0 edi 0x804f200 134541824 eip 0xf7e11f7b 0xf7e11f7b <__setuid+27> eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) stepi 29 result = INLINE_SETXID_SYSCALL (setuid32, 1, uid); (gdb) stepi 0xf7e11fa8 29 result = INLINE_SETXID_SYSCALL (setuid32, 1, uid); (gdb) disassemble __setuid,+128 Dump of assembler code from 0xf7e11f60 to 0xf7e11fe0: 0xf7e11f60 <__setuid+0>: push %ebx 0xf7e11f61 <__setuid+1>: call 0xf7e9080f <__x86.get_pc_thunk.bx> 0xf7e11f66 <__setuid+6>: add $0x11509a,%ebx 0xf7e11f6c <__setuid+12>: sub $0x28,%esp 0xf7e11f6f <__setuid+15>: mov 0x3d18(%ebx),%eax 0xf7e11f75 <__setuid+21>: mov 0x30(%esp),%edx 0xf7e11f79 <__setuid+25>: test %eax,%eax 0xf7e11f7b <__setuid+27>: jne 0xf7e11fa0 <__setuid+64> 0xf7e11f7d <__setuid+29>: xchg %edx,%ebx 0xf7e11f7f <__setuid+31>: mov $0xd5,%eax 0xf7e11f84 <__setuid+36>: call *%gs:0x10 0xf7e11f8b <__setuid+43>: xchg %edx,%ebx 0xf7e11f8d <__setuid+45>: cmp $0xfffff000,%eax 0xf7e11f92 <__setuid+50>: ja 0xf7e11fd0 <__setuid+112> 0xf7e11f94 <__setuid+52>: add $0x28,%esp 0xf7e11f97 <__setuid+55>: pop %ebx 0xf7e11f98 <__setuid+56>: ret 0xf7e11f99 <__setuid+57>: lea 0x0(%esi,%eiz,1),%esi 0xf7e11fa0 <__setuid+64>: movl $0xd5,0x8(%esp) => 0xf7e11fa8 <__setuid+72>: mov %edx,0xc(%esp) 0xf7e11fac <__setuid+76>: sub $0xc,%esp 0xf7e11faf <__setuid+79>: lea 0x14(%esp),%edx 0xf7e11fb3 <__setuid+83>: mov 0x3d0c(%ebx),%eax 0xf7e11fb9 <__setuid+89>: ror $0x9,%eax 0xf7e11fbc <__setuid+92>: xor %gs:0x18,%eax 0xf7e11fc3 <__setuid+99>: push %edx 0xf7e11fc4 <__setuid+100>: call *%eax 0xf7e11fc6 <__setuid+102>: add $0x10,%esp 0xf7e11fc9 <__setuid+105>: add $0x28,%esp 0xf7e11fcc <__setuid+108>: pop %ebx 0xf7e11fcd <__setuid+109>: ret 0xf7e11fce <__setuid+110>: xchg %ax,%ax 0xf7e11fd0 <__setuid+112>: mov -0x10c(%ebx),%edx 0xf7e11fd6 <__setuid+118>: neg %eax 0xf7e11fd8 <__setuid+120>: mov %eax,%gs:(%edx) 0xf7e11fdb <__setuid+123>: mov $0xffffffff,%eax At this point, the eax=1 takes us on a bad path, and we've missed the boat and we're not going to call sysenter. A little later: 0xf7e11fbc <__setuid+92>: xor %gs:0x18,%eax 0xf7e11fc3 <__setuid+99>: push %edx => 0xf7e11fc4 <__setuid+100>: call *%eax 0xf7e11fc6 <__setuid+102>: add $0x10,%esp 0xf7e11fc9 <__setuid+105>: add $0x28,%esp 0xf7e11fcc <__setuid+108>: pop %ebx 0xf7e11fcd <__setuid+109>: ret 0xf7e11fce <__setuid+110>: xchg %ax,%ax 0xf7e11fd0 <__setuid+112>: mov -0x10c(%ebx),%edx 0xf7e11fd6 <__setuid+118>: neg %eax 0xf7e11fd8 <__setuid+120>: mov %eax,%gs:(%edx) 0xf7e11fdb <__setuid+123>: mov $0xffffffff,%eax End of assembler dump. (gdb) stepi __nptl_setxid (cmdp=0xffffd4f8) at allocatestack.c:1085 1085 { which eventually winds up here: (gdb) step 1174 result = INTERNAL_SYSCALL_NCS (cmdp->syscall_no, err, 3, (gdb) info registers eax 0xf7d52700 -137025792 ecx 0x1 1 edx 0x0 0 ebx 0xf7f94000 -134660096 esp 0xffffd4a0 0xffffd4a0 ebp 0xf7f961c4 0xf7f961c4 <__stack_user> esi 0xf0 240 edi 0xf7d52700 -137025792 eip 0xf7f7fc90 0xf7f7fc90 <__nptl_setxid+528> eflags 0x246 [ PF ZF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 and the busted eax value results in syscall_4157941504 because this binary in fact doesn't use nptl, so all the list-walking in __nptl_setxid() is walking off the end of the earth, and cmdp-> is a steaming pile of dingo's kidneys - I'm surprised it didn't SIGSEGV along the way... The disassembly of the -28 version that works doesn't appear much different: (gdb) disassemble __setuid,+64 Dump of assembler code from 0xf7e11eb0 to 0xf7e11ef0: => 0xf7e11eb0 <setuid+0>: push %ebx 0xf7e11eb1 <setuid+1>: call 0xf7e906ff <__x86.get_pc_thunk.bx> 0xf7e11eb6 <setuid+6>: add $0x11514a,%ebx 0xf7e11ebc <setuid+12>: sub $0x28,%esp 0xf7e11ebf <setuid+15>: mov 0x3d18(%ebx),%eax 0xf7e11ec5 <setuid+21>: mov 0x30(%esp),%edx 0xf7e11ec9 <setuid+25>: test %eax,%eax 0xf7e11ecb <setuid+27>: jne 0xf7e11ef0 <setuid+64> 0xf7e11ecd <setuid+29>: xchg %edx,%ebx 0xf7e11ecf <setuid+31>: mov $0xd5,%eax 0xf7e11ed4 <setuid+36>: call *%gs:0x10 so I have no idea why the -29 version waltzes off to NPTL land. Did something change in get_pc_thunk.bx? Version-Release number of selected component (if applicable): glibc-2.19.90-29.fc22.i686 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Looking at the glibc bugzilla and ChangeLog, I suspect it's possibly related to Florian Weimer's work on bugs 17135 and/or 13347, or possibly this: 2014-07-07 Roland McGrath <roland.com> * sysdeps/nptl/lowlevellock.h: File removed. * NEWS: NPTL is no longer an add-on! Not sure which though....
I have been having a problem with this as well. Downgrading to the previous version of glibc in rawhide made things work again. For me h, sshd, scp, and crond. Fortunately I was still able to run yum downgrade in a pre-existing root shell to fix things.
Bad news here. One machine installed this update and forgot to downgrade now reboot and no longer can use the system anymore.
per c#3, this sounds bad enough to be an Alpha blocker, https://fedoraproject.org/wiki/Fedora_21_Alpha_Release_Criteria#Expected_installed_system_boot_behavior .
This hasn't appeared in branched yet, so it's a bit premature for it to be a blocker. (Currently it's only affecting rawhide.)
whoops, you're quite right, my bad. withdrawing. if you could keep an eye and re-propose if it hits f21, though, that'd be great.
I'm keeping an eye on it. My primary desktop at home is using rawhide and I'll need to watch updates until it's fixed. (I could lock the glibc version, but that makes it harder to watch for updates.) I think if this is going to be a while before it is fixed, that the maintainer should untag the bad version from rawhide.
And the more I think about it, the more I wonder why 64-bit code works, but 32-bit code blows chunks in a spectacular manner. (And for that matter, why does strace report 'setuid()' in 64 bit mode, but 'setuid32()' in 32-bit mode?)
systemd-logind.service entered failed state - has no holdoff time - i686 https://bugzilla.redhat.com/show_bug.cgi?id=1121419
(In reply to Valdis Kletnieks from comment #1) > Looking at the glibc bugzilla and ChangeLog, I suspect it's possibly related > to Florian Weimer's work on bugs 17135 and/or 13347, or possibly this: > > > 2014-07-07 Roland McGrath <roland.com> > * sysdeps/nptl/lowlevellock.h: File removed. > * NEWS: NPTL is no longer an add-on! > > Not sure which though.... https://bugzilla.redhat.com/show_bug.cgi?id=1121419#c6
Maintainers: as this is completely breaking 32-bit Rawhide machines, if you don't have a fix coming pretty soon, could you please revert the changes from -29 until they can be fixed? Thanks.
*** Bug 1121419 has been marked as a duplicate of this bug. ***
Please feel free to untag; Carlos and I are travelling and it is likely that we may not be able to get to it soon enough.
It turns out this is was a GCC bug, reported here (for the very same glibc code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801 It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling glibc should fix this bug.
I have built -30 with the resync reverted for now. I'll do another resync next week and check if this bug is fixed on i686.
(In reply to Florian Weimer from comment #14) > It turns out this is was a GCC bug, reported here (for the very same glibc > code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801 > > It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling > glibc should fix this bug. Hey Florian, thanks for finding the *real* cause - I'd never have found that. I got distracted by the fact it was in NPTL code and I couldn't figure out why a single-threaded program was doing that.
-30 does alleviate the immediate issue. I don't know whether you'd like to keep this open until things are fully fixed or just close this now since the fixed build is in rawhide (though it didn't make this morning's compose).
(In reply to Bruno Wolff III from comment #17) > -30 does alleviate the immediate issue. I don't know whether you'd like to > keep this open until things are fully fixed or just close this now since the > fixed build is in rawhide (though it didn't make this morning's compose). I'll close it once I do the next rebase and confirm that the problem is fixed.
Tested latest master on i686 and x86_64 and I don't see any disasters like bricked boxes/vms and basic functionality like rebooting, restarting services and internet connectivity (including yum check-update with a clean cache) works, so I've pushed a rawhide rebase: http://koji.fedoraproject.org/koji/taskinfo?taskID=7204483 I'll close the bug once the build completes.