1120473 – setxid wrapper in glibc-2.19.90-29 is miscompiled due to gcc bug

Bug 1120473 - setxid wrapper in glibc-2.19.90-29 is miscompiled due to gcc bug

Summary: setxid wrapper in glibc-2.19.90-29 is miscompiled due to gcc bug

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Carlos O'Donell
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1121419 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-17 03:36 UTC by Valdis Kletnieks
Modified:	2014-07-30 08:11 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-07-30 08:11:36 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNU Compiler Collection	61801	0	P3	RESOLVED	sched2 miscompiles syscall sequence with -g	2020-09-23 07:17:36 UTC
Red Hat Bugzilla	1119769	0	unspecified	CLOSED	Python 2\|3 test suite fails due to a bug in setxid wrapper	2024-07-30 11:20:27 UTC

Internal Links: 1119769

Description Valdis Kletnieks 2014-07-17 03:36:35 UTC

Description of problem: While trying to get a third-part 32 bit program to run, I hit problems with setuid32.  strace of the binary on an Ubuntu system reported:

[f7727425] fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[f7727425] setuid32(0)                  = 0
[f7727425] time(NULL)                   = 1405561829

while on my Rawhide box it dies mysteriously:

[f7fd5aa0] fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[f7fd5aa0] syscall_4157941504(0, 0xffffd628, 0xf7feeaa0, 0xf7d52700, 0xf7f94000, 0xf7f961c4) = -1 (errno 38)

Testing shows that it does 'setuid32(0) = 0' with 2.19.90-28 on the Rawhide box as well, and the binary functions just fine.

Where it goes astray, according to GDB (cutting-and-pasting pieces)

(gdb) disassemble __setuid,+64
Dump of assembler code from 0xf7e11f60 to 0xf7e11fa0:
   0xf7e11f60 <__setuid+0>:     push   %ebx
   0xf7e11f61 <__setuid+1>:     call   0xf7e9080f <__x86.get_pc_thunk.bx>
   0xf7e11f66 <__setuid+6>:     add    $0x11509a,%ebx
   0xf7e11f6c <__setuid+12>:    sub    $0x28,%esp
   0xf7e11f6f <__setuid+15>:    mov    0x3d18(%ebx),%eax
   0xf7e11f75 <__setuid+21>:    mov    0x30(%esp),%edx
=> 0xf7e11f79 <__setuid+25>:    test   %eax,%eax
   0xf7e11f7b <__setuid+27>:    jne    0xf7e11fa0 <__setuid+64>
   0xf7e11f7d <__setuid+29>:    xchg   %edx,%ebx
   0xf7e11f7f <__setuid+31>:    mov    $0xd5,%eax
   0xf7e11f84 <__setuid+36>:    call   *%gs:0x10
   0xf7e11f8b <__setuid+43>:    xchg   %edx,%ebx
   0xf7e11f8d <__setuid+45>:    cmp    $0xfffff000,%eax
   0xf7e11f92 <__setuid+50>:    ja     0xf7e11fd0 <__setuid+112>
   0xf7e11f94 <__setuid+52>:    add    $0x28,%esp
   0xf7e11f97 <__setuid+55>:    pop    %ebx
   0xf7e11f98 <__setuid+56>:    ret    
   0xf7e11f99 <__setuid+57>:    lea    0x0(%esi,%eiz,1),%esi
End of assembler dump.
(gdb) stepi
0xf7e11f7b      29        result = INLINE_SETXID_SYSCALL (setuid32, 1, uid);
(gdb) info registers
eax            0x1      1
ecx            0x53c6e214       1405542932
edx            0x0      0
ebx            0xf7f27000       -135106560
esp            0xffffd4f0       0xffffd4f0
ebp            0xffffd638       0xffffd638
esi            0x0      0
edi            0x804f200        134541824
eip            0xf7e11f7b       0xf7e11f7b <__setuid+27>
eflags         0x202    [ IF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99
(gdb) stepi
29        result = INLINE_SETXID_SYSCALL (setuid32, 1, uid);
(gdb) stepi
0xf7e11fa8      29        result = INLINE_SETXID_SYSCALL (setuid32, 1, uid);
(gdb) disassemble __setuid,+128
Dump of assembler code from 0xf7e11f60 to 0xf7e11fe0:
   0xf7e11f60 <__setuid+0>:     push   %ebx
   0xf7e11f61 <__setuid+1>:     call   0xf7e9080f <__x86.get_pc_thunk.bx>
   0xf7e11f66 <__setuid+6>:     add    $0x11509a,%ebx
   0xf7e11f6c <__setuid+12>:    sub    $0x28,%esp
   0xf7e11f6f <__setuid+15>:    mov    0x3d18(%ebx),%eax
   0xf7e11f75 <__setuid+21>:    mov    0x30(%esp),%edx
   0xf7e11f79 <__setuid+25>:    test   %eax,%eax
   0xf7e11f7b <__setuid+27>:    jne    0xf7e11fa0 <__setuid+64>
   0xf7e11f7d <__setuid+29>:    xchg   %edx,%ebx
   0xf7e11f7f <__setuid+31>:    mov    $0xd5,%eax
   0xf7e11f84 <__setuid+36>:    call   *%gs:0x10
   0xf7e11f8b <__setuid+43>:    xchg   %edx,%ebx
   0xf7e11f8d <__setuid+45>:    cmp    $0xfffff000,%eax
   0xf7e11f92 <__setuid+50>:    ja     0xf7e11fd0 <__setuid+112>
   0xf7e11f94 <__setuid+52>:    add    $0x28,%esp
   0xf7e11f97 <__setuid+55>:    pop    %ebx
   0xf7e11f98 <__setuid+56>:    ret    
   0xf7e11f99 <__setuid+57>:    lea    0x0(%esi,%eiz,1),%esi
   0xf7e11fa0 <__setuid+64>:    movl   $0xd5,0x8(%esp)
=> 0xf7e11fa8 <__setuid+72>:    mov    %edx,0xc(%esp)
   0xf7e11fac <__setuid+76>:    sub    $0xc,%esp
   0xf7e11faf <__setuid+79>:    lea    0x14(%esp),%edx
   0xf7e11fb3 <__setuid+83>:    mov    0x3d0c(%ebx),%eax
   0xf7e11fb9 <__setuid+89>:    ror    $0x9,%eax
   0xf7e11fbc <__setuid+92>:    xor    %gs:0x18,%eax
   0xf7e11fc3 <__setuid+99>:    push   %edx
   0xf7e11fc4 <__setuid+100>:   call   *%eax
   0xf7e11fc6 <__setuid+102>:   add    $0x10,%esp
   0xf7e11fc9 <__setuid+105>:   add    $0x28,%esp
   0xf7e11fcc <__setuid+108>:   pop    %ebx
   0xf7e11fcd <__setuid+109>:   ret    
   0xf7e11fce <__setuid+110>:   xchg   %ax,%ax
   0xf7e11fd0 <__setuid+112>:   mov    -0x10c(%ebx),%edx
   0xf7e11fd6 <__setuid+118>:   neg    %eax
   0xf7e11fd8 <__setuid+120>:   mov    %eax,%gs:(%edx)
   0xf7e11fdb <__setuid+123>:   mov    $0xffffffff,%eax


At this point, the eax=1 takes us on a bad path, and we've missed the boat and we're not going to call sysenter.

A little later:

   0xf7e11fbc <__setuid+92>:    xor    %gs:0x18,%eax
   0xf7e11fc3 <__setuid+99>:    push   %edx
=> 0xf7e11fc4 <__setuid+100>:   call   *%eax
   0xf7e11fc6 <__setuid+102>:   add    $0x10,%esp
   0xf7e11fc9 <__setuid+105>:   add    $0x28,%esp
   0xf7e11fcc <__setuid+108>:   pop    %ebx
   0xf7e11fcd <__setuid+109>:   ret    
   0xf7e11fce <__setuid+110>:   xchg   %ax,%ax
   0xf7e11fd0 <__setuid+112>:   mov    -0x10c(%ebx),%edx
   0xf7e11fd6 <__setuid+118>:   neg    %eax
   0xf7e11fd8 <__setuid+120>:   mov    %eax,%gs:(%edx)
   0xf7e11fdb <__setuid+123>:   mov    $0xffffffff,%eax
End of assembler dump.
(gdb) stepi
__nptl_setxid (cmdp=0xffffd4f8) at allocatestack.c:1085
1085    {

which eventually winds up here:

(gdb) step
1174      result = INTERNAL_SYSCALL_NCS (cmdp->syscall_no, err, 3,
(gdb) info registers
eax            0xf7d52700       -137025792
ecx            0x1      1
edx            0x0      0
ebx            0xf7f94000       -134660096
esp            0xffffd4a0       0xffffd4a0
ebp            0xf7f961c4       0xf7f961c4 <__stack_user>
esi            0xf0     240
edi            0xf7d52700       -137025792
eip            0xf7f7fc90       0xf7f7fc90 <__nptl_setxid+528>
eflags         0x246    [ PF ZF IF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99

and the busted eax value results in syscall_4157941504 because this binary in fact doesn't use nptl, so all the list-walking in __nptl_setxid() is walking off the end of the earth, and cmdp-> is a steaming pile of dingo's kidneys - I'm surprised it didn't SIGSEGV along the way...

The disassembly of the -28 version that works doesn't appear much different:
(gdb) disassemble __setuid,+64
Dump of assembler code from 0xf7e11eb0 to 0xf7e11ef0:
=> 0xf7e11eb0 <setuid+0>:       push   %ebx
   0xf7e11eb1 <setuid+1>:       call   0xf7e906ff <__x86.get_pc_thunk.bx>
   0xf7e11eb6 <setuid+6>:       add    $0x11514a,%ebx
   0xf7e11ebc <setuid+12>:      sub    $0x28,%esp
   0xf7e11ebf <setuid+15>:      mov    0x3d18(%ebx),%eax
   0xf7e11ec5 <setuid+21>:      mov    0x30(%esp),%edx
   0xf7e11ec9 <setuid+25>:      test   %eax,%eax
   0xf7e11ecb <setuid+27>:      jne    0xf7e11ef0 <setuid+64>
   0xf7e11ecd <setuid+29>:      xchg   %edx,%ebx
   0xf7e11ecf <setuid+31>:      mov    $0xd5,%eax
   0xf7e11ed4 <setuid+36>:      call   *%gs:0x10

so I have no idea why the -29 version waltzes off to NPTL land. Did something change in get_pc_thunk.bx?


Version-Release number of selected component (if applicable):
glibc-2.19.90-29.fc22.i686

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Valdis Kletnieks 2014-07-17 04:39:27 UTC

Looking at the glibc bugzilla and ChangeLog, I suspect it's possibly related to Florian Weimer's work on bugs 17135 and/or 13347, or possibly this:


2014-07-07  Roland McGrath  <roland.com>
        * sysdeps/nptl/lowlevellock.h: File removed.
        * NEWS: NPTL is no longer an add-on!

Not sure which though....

Comment 2 Bruno Wolff III 2014-07-17 19:05:59 UTC

I have been having a problem with this as well. Downgrading to the previous version of glibc in rawhide made things work again. For me h, sshd, scp, and crond. Fortunately I was still able to run yum downgrade in a pre-existing root shell to fix things.

Comment 3 Christopher Meng 2014-07-18 00:55:28 UTC

Bad news here. 

One machine installed this update and forgot to downgrade now reboot and no longer can use the system anymore.

Comment 4 Adam Williamson 2014-07-18 16:53:23 UTC

per c#3, this sounds bad enough to be an Alpha blocker, https://fedoraproject.org/wiki/Fedora_21_Alpha_Release_Criteria#Expected_installed_system_boot_behavior .

Comment 5 Bruno Wolff III 2014-07-18 17:03:10 UTC

This hasn't appeared in branched yet, so it's a bit premature for it to be a blocker. (Currently it's only affecting rawhide.)

Comment 6 Adam Williamson 2014-07-18 20:26:20 UTC

whoops, you're quite right, my bad. withdrawing. if you could keep an eye and re-propose if it hits f21, though, that'd be great.

Comment 7 Bruno Wolff III 2014-07-18 20:33:07 UTC

I'm keeping an eye on it. My primary desktop at home is using rawhide and I'll need to watch updates until it's fixed. (I could lock the glibc version, but that makes it harder to watch for updates.)
I think if this is going to be a while before it is fixed, that the maintainer should untag the bad version from rawhide.

Comment 8 Valdis Kletnieks 2014-07-18 20:55:08 UTC

And the more I think about it, the more I wonder why 64-bit code works, but 32-bit code blows chunks in a spectacular manner.  (And for that matter, why does strace report 'setuid()' in 64 bit mode, but 'setuid32()' in 32-bit mode?)

Comment 9 poma 2014-07-21 09:51:02 UTC

systemd-logind.service entered failed state - has no holdoff time - i686
https://bugzilla.redhat.com/show_bug.cgi?id=1121419

Comment 10 poma 2014-07-21 15:08:57 UTC

(In reply to Valdis Kletnieks from comment #1)
> Looking at the glibc bugzilla and ChangeLog, I suspect it's possibly related
> to Florian Weimer's work on bugs 17135 and/or 13347, or possibly this:
> 
> 
> 2014-07-07  Roland McGrath  <roland.com>
>         * sysdeps/nptl/lowlevellock.h: File removed.
>         * NEWS: NPTL is no longer an add-on!
> 
> Not sure which though....

https://bugzilla.redhat.com/show_bug.cgi?id=1121419#c6

Comment 11 Adam Williamson 2014-07-21 15:17:49 UTC

Maintainers: as this is completely breaking 32-bit Rawhide machines, if you don't have a fix coming pretty soon, could you please revert the changes from -29 until they can be fixed? Thanks.

Comment 12 Adam Williamson 2014-07-21 15:49:00 UTC

*** Bug 1121419 has been marked as a duplicate of this bug. ***

Comment 13 Siddhesh Poyarekar 2014-07-21 18:49:32 UTC

Please feel free to untag; Carlos and I are travelling and it is likely that we may not be able to get to it soon enough.

Comment 14 Florian Weimer 2014-07-23 09:41:50 UTC

It turns out this is was a GCC bug, reported here (for the very same glibc code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801

It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling glibc should fix this bug.

Comment 15 Siddhesh Poyarekar 2014-07-23 09:53:10 UTC

I have built -30 with the resync reverted for now.  I'll do another resync next week and check if this bug is fixed on i686.

Comment 16 Valdis Kletnieks 2014-07-23 11:57:37 UTC

(In reply to Florian Weimer from comment #14)
> It turns out this is was a GCC bug, reported here (for the very same glibc
> code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801
> 
> It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling
> glibc should fix this bug.

Hey Florian, thanks for finding the *real* cause - I'd never have found that. I got distracted by the fact it was in NPTL code and I couldn't figure out why a single-threaded program was doing that.

Comment 17 Bruno Wolff III 2014-07-23 14:44:17 UTC

-30 does alleviate the immediate issue. I don't know whether you'd like to keep this open until things are fully fixed or just close this now since the fixed build is in rawhide (though it didn't make this morning's compose).

Comment 18 Siddhesh Poyarekar 2014-07-28 05:44:21 UTC

(In reply to Bruno Wolff III from comment #17)
> -30 does alleviate the immediate issue. I don't know whether you'd like to
> keep this open until things are fully fixed or just close this now since the
> fixed build is in rawhide (though it didn't make this morning's compose).

I'll close it once I do the next rebase and confirm that the problem is fixed.

Comment 19 Siddhesh Poyarekar 2014-07-28 19:00:26 UTC

Tested latest master on i686 and x86_64 and I don't see any disasters like bricked boxes/vms and basic functionality like rebooting, restarting services and internet connectivity (including yum check-update with a clean cache) works, so I've pushed a rawhide rebase:

http://koji.fedoraproject.org/koji/taskinfo?taskID=7204483

I'll close the bug once the build completes.

Note You need to log in before you can comment on or make changes to this bug.