Bug 1119769 - Python 2|3 test suite fails due to a bug in setxid wrapper
Summary: Python 2|3 test suite fails due to a bug in setxid wrapper
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-15 13:12 UTC by Matej Stuchlik
Modified: 2016-02-01 02:15 UTC (History)
9 users (show)

Fixed In Version: 2.19.90-31.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-29 11:43:17 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1120473 None None None Never

Internal Links: 1120473

Description Matej Stuchlik 2014-07-15 13:12:54 UTC
Description of problem:
Python 2|3 test suite fails in koji/mock

Version-Release number of selected component (if applicable):
python-2.7.7-3
python3-3.4.1-13
glibc-2.19.90-28

How reproducible:
Rebuild in koji/mock

Additional info:
Python2 build log: https://kojipkgs.fedoraproject.org//work/tasks/5864/7135864/build.log
Python3 build log: https://kojipkgs.fedoraproject.org//work/tasks/2023/7092023/build.log

This is likely caused by [0], which should be already fixed upstream.

[0] https://sourceware.org/bugzilla/show_bug.cgi?id=17135

Comment 1 Siddhesh Poyarekar 2014-07-15 16:40:08 UTC
I'm rebasing right now, so rawhide should be fixed soonish.

Comment 2 Robert Kuska 2014-07-16 06:19:23 UTC
Rebase needed also in f21.

Comment 3 Matej Stuchlik 2014-07-16 13:25:41 UTC
Rebuilding with glibc-2.19.90-29 seems to fix the issue on armv7hl and x86_64 BUT not on i686 [0]. Any idea why that could be the case, from your point of view?

[0] http://koji.fedoraproject.org/koji/taskinfo?taskID=7149963

Comment 4 Florian Weimer 2014-07-22 19:32:23 UTC
Something is going wrong in __nptl_setxid:

1171	  /* This must be last, otherwise the current thread might not have
1172	     permissions to send SIGSETXID syscall to the other threads.  */
1173	  INTERNAL_SYSCALL_DECL (err);
1174	  result = INTERNAL_SYSCALL_NCS (cmdp->syscall_no, err, 3,
1175					 cmdp->id[0], cmdp->id[1], cmdp->id[2]);

(gdb) print *cmdp
$2 = {syscall_no = 214, id = {-1, -135080352, -134288736}, cntr = 0, error = -1}

This looks okay, but inside __kernel_vsyscall, we have:

(gdb) info reg
eax            0xf7bdb700	-138561792
ecx            0xf7f2d660	-135080352
edx            0xf7feeaa0	-134288736
ebx            0xffffffff	-1
esp            0xffffd25c	0xffffd25c
ebp            0xf7e281c4	0xf7e281c4 <__stack_user>
esi            0xf7bdb700	-138561792
edi            0xf7e26000	-136159232
eip            0xf7fd7420	0xf7fd7420 <__kernel_vsyscall>
eflags         0x246	[ PF ZF IF ]
cs             0x23	35
ss             0x2b	43
ds             0x2b	43
es             0x2b	43
fs             0x0	0
gs             0x63	99

eax should be 214.  Disassembly of line 1174 in __nptl_setxid:

   0xf7e11c90 <+528>:	mov    0x40(%esp),%eax
   0xf7e11c94 <+532>:	mov    0x4(%esp),%esi
   0xf7e11c98 <+536>:	mov    (%eax),%eax
   0xf7e11c9a <+538>:	mov    %eax,0x4(%esp)
   0xf7e11c9e <+542>:	mov    0x40(%esp),%eax
   0xf7e11ca2 <+546>:	mov    0x4(%eax),%edi
   0xf7e11ca5 <+549>:	mov    0x8(%eax),%ecx
   0xf7e11ca8 <+552>:	mov    0xc(%eax),%edx
   0xf7e11cab <+555>:	mov    %esi,%eax
   0xf7e11cad <+557>:	xchg   %ebx,%edi
   0xf7e11caf <+559>:	call   *%gs:0x10

Looks like the  mov %esi,%eax at address 0xf7e11cab overwrites the system call number.

Perhaps the inline assembly constraints in the definition of INTERNAL_SYSCALL_NCS are off, and the different register allocation in __nptl_setxid triggered this miscompilation.  I will debug this further tomorrow.

Comment 5 Florian Weimer 2014-07-23 08:33:41 UTC
It turns out this is was a GCC bug, reported here (for the very same glibc code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801

It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling glibc should fix this bug.

Comment 6 Siddhesh Poyarekar 2014-07-23 09:17:33 UTC
(In reply to Florian Weimer from comment #5)
> It turns out this is was a GCC bug, reported here (for the very same glibc
> code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801
> 
> It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling
> glibc should fix this bug.

Maybe this is bug 1120473 and not this one.  This one is due to your original setxid patch AFAIK.

Comment 7 Florian Weimer 2014-07-23 09:39:53 UTC
(In reply to Siddhesh Poyarekar from comment #6)
> Maybe this is bug 1120473 and not this one.  This one is due to your
> original setxid patch AFAIK.

I was referring to the i386 failure mentioned in comment #3.  I agree that this is bug 1120473.

Comment 8 Dan Horák 2014-07-23 09:50:02 UTC
(In reply to Siddhesh Poyarekar from comment #6)
> (In reply to Florian Weimer from comment #5)
> > It turns out this is was a GCC bug, reported here (for the very same glibc
> > code): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801
> > 
> > It has already been fixed in gcc-4.9.1-2 in rawhide, so simply recompiling
> > glibc should fix this bug.
> 
> Maybe this is bug 1120473 and not this one.  This one is due to your
> original setxid patch AFAIK.

Siddhesh, I see rawhide/f22 build running, but please submit a build from f21 branch too.

Comment 9 Siddhesh Poyarekar 2014-07-23 09:57:21 UTC
(In reply to Dan Horák from comment #8)
> Siddhesh, I see rawhide/f22 build running, but please submit a build from
> f21 branch too.

That is a revert to -28 fix bug 1120473; I'm on leave this week, so I don't have enough time to actually test and do a build.  I'll do a proper f21 rebase (along with a rawhide rebase) next week.

Comment 10 Florian Weimer 2014-07-23 10:13:40 UTC
(In reply to Siddhesh Poyarekar from comment #9)
> (In reply to Dan Horák from comment #8)
> > Siddhesh, I see rawhide/f22 build running, but please submit a build from
> > f21 branch too.
> 
> That is a revert to -28 fix bug 1120473; I'm on leave this week, so I don't
> have enough time to actually test and do a build.  I'll do a proper f21
> rebase (along with a rawhide rebase) next week.

Uhm, it's a GCC bug, so just recompiling the current f21 branch should fix things.  No reverting of glibc changes is required.  In rawhide, you didn't even back out any setxid-related changes, but comment #3/bug 1120473 should be addressed nevertheless because the rebuild will pick up a newer GCC.

Comment 11 Siddhesh Poyarekar 2014-07-28 05:43:30 UTC
(In reply to Florian Weimer from comment #10)
> Uhm, it's a GCC bug, so just recompiling the current f21 branch should fix
> things.  No reverting of glibc changes is required.  In rawhide, you didn't
> even back out any setxid-related changes, but comment #3/bug 1120473 should
> be addressed nevertheless because the rebuild will pick up a newer GCC.

I had pushed the revert in response to Adam Williamson's email to me, telling me about broken rawhide on i686.  I was traveling and didn't notice your bz update until after I had pushed the build.

I rebase rawhide weekly, mainly to fish out problems like this, so the next rebase this week should actually fix the problem in bug 1120473.  Maybe I should not have done it while both me and Carlos were traveling.

Comment 12 Siddhesh Poyarekar 2014-09-29 11:43:17 UTC
This was fixed in rawhide and f21 with a resync and rebuild.

commit d579c1af5bae7dd255b3f3f725fabe1dc7d2e061
Author: Siddhesh Poyarekar <siddhesh@redhat.com>
Date:   Tue Jul 29 00:25:03 2014 +0530

    Auto-sync with upstream master


Note You need to log in before you can comment on or make changes to this bug.