Bug 1108925 - libcap-ng: issues with python testsuite
Summary: libcap-ng: issues with python testsuite
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARM64, F-ExcludeArch-aarch64
TreeView+ depends on / blocked
 
Reported: 2014-06-12 21:04 UTC by Peter Robinson
Modified: 2014-08-01 13:34 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-01 13:34:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
the build logs (43.46 KB, text/x-log)
2014-06-12 21:04 UTC, Peter Robinson
no flags Details

Description Peter Robinson 2014-06-12 21:04:08 UTC
Created attachment 908307 [details]
the build logs

We're seeing a failure in building libcap-ng on any release of glibc-2.19.90-17.fc21 or later.

With glibc-2.19.90-16.fc21 and earlier it builds fine.

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2405574

Comment 1 Marcin Juszkiewicz 2014-06-12 21:16:16 UTC
Note: those checks were made with up-to-date rawhide with just glibc packages exchanged to older versions.

Comment 2 Siddhesh Poyarekar 2014-06-12 21:24:04 UTC
Could you please isolate what failed so that we know actually what's wrong?  There seem to be a bunch of tests in a single file.

In other words, a reproducer would be helpful.

Comment 3 Marcin Juszkiewicz 2014-06-12 21:35:42 UTC
Siddhesh: cd bindings/python/test/; make check (or make test)

It will run Python script which has several checks and when one of them fail then it exits. With glibc 2.19.90-16 it passes. With newer it fails.

Comment 4 Siddhesh Poyarekar 2014-06-13 09:06:42 UTC
It looks like Kyle's patch is causing it.  Kyle, can you please look at it?

Comment 5 Kyle McMartin 2014-06-13 13:40:45 UTC
Yes. Although, given it's a one line patch that fixes dlopen-ing anything with TLS, I'm inclined to say we should just declare TLSDESC-by-default a failed experiment and flip to traditional TLS in gcc. Sigh. (Given until my patch, we were always using a different code path for TLS descriptors until we exhausted static TLS space, I'm not surprised there is ugliness lurking.)

Comment 6 Kyle McMartin 2014-06-13 13:54:14 UTC
Hmm, hang on. It works fine built against an older glibc which contains that patch. I suspect that might be a red herring. Possibly new gcc skew?

Comment 7 Kyle McMartin 2014-06-13 13:55:06 UTC
Nope, 4.9.0-5 in both. Awesome.

Comment 8 Kyle McMartin 2014-06-13 17:20:17 UTC
Reverting it and rebuilding glibc does indeed seem to fix it. Wonderful. So there's a subtle bug in the _dl_tlsdesc_dynamic code somewhere...

Comment 9 Kyle McMartin 2014-06-17 17:01:55 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=7052250

well, the failures in the python test-suite are not limited to aarch64. I see them on i686 and x86_64 as well.

Better still, an earlier test fails when you run it in mock --shell in the same way on aarch64 and x86_64.

Better-better still, this means it has nothing to do with TLS descriptors, and may be a generic TLS code generation bug in GCC, since x86_64 does not use them to access dynamic TLS symbols.

Comment 10 Kyle McMartin 2014-06-17 18:34:49 UTC
Building with CFLAGS="-O1" results in a working src/.libs/libcap-ng.so.0 so this looks like an optimization bug on AArch64... my current theory is that the reason the glibc version matters is that in the older glibc, we'd get a static TLS slot, which means we'd take the fast return path and probably avoid clobbering the register.

The X86_64 thing seems like a red herring at the moment... I'm working on reducing this to the appropriate gcc optimizer flag.

Comment 11 Kyle McMartin 2014-06-17 19:54:05 UTC
OK, building with -O2 -fno-schedule-insns -fno-schedule-insns2 appears to get things working again.


Note You need to log in before you can comment on or make changes to this bug.