Bug 1108925

Summary: libcap-ng: issues with python testsuite
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: blc, codonell, jakub, kmcmartin, law, mjuszkie, pbrobinson, pfrankli, spoyarek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-01 13:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 922257    
Attachments:
Description Flags
the build logs none

Description Peter Robinson 2014-06-12 21:04:08 UTC
Created attachment 908307 [details]
the build logs

We're seeing a failure in building libcap-ng on any release of glibc-2.19.90-17.fc21 or later.

With glibc-2.19.90-16.fc21 and earlier it builds fine.

http://arm.koji.fedoraproject.org/koji/taskinfo?taskID=2405574

Comment 1 Marcin Juszkiewicz 2014-06-12 21:16:16 UTC
Note: those checks were made with up-to-date rawhide with just glibc packages exchanged to older versions.

Comment 2 Siddhesh Poyarekar 2014-06-12 21:24:04 UTC
Could you please isolate what failed so that we know actually what's wrong?  There seem to be a bunch of tests in a single file.

In other words, a reproducer would be helpful.

Comment 3 Marcin Juszkiewicz 2014-06-12 21:35:42 UTC
Siddhesh: cd bindings/python/test/; make check (or make test)

It will run Python script which has several checks and when one of them fail then it exits. With glibc 2.19.90-16 it passes. With newer it fails.

Comment 4 Siddhesh Poyarekar 2014-06-13 09:06:42 UTC
It looks like Kyle's patch is causing it.  Kyle, can you please look at it?

Comment 5 Kyle McMartin 2014-06-13 13:40:45 UTC
Yes. Although, given it's a one line patch that fixes dlopen-ing anything with TLS, I'm inclined to say we should just declare TLSDESC-by-default a failed experiment and flip to traditional TLS in gcc. Sigh. (Given until my patch, we were always using a different code path for TLS descriptors until we exhausted static TLS space, I'm not surprised there is ugliness lurking.)

Comment 6 Kyle McMartin 2014-06-13 13:54:14 UTC
Hmm, hang on. It works fine built against an older glibc which contains that patch. I suspect that might be a red herring. Possibly new gcc skew?

Comment 7 Kyle McMartin 2014-06-13 13:55:06 UTC
Nope, 4.9.0-5 in both. Awesome.

Comment 8 Kyle McMartin 2014-06-13 17:20:17 UTC
Reverting it and rebuilding glibc does indeed seem to fix it. Wonderful. So there's a subtle bug in the _dl_tlsdesc_dynamic code somewhere...

Comment 9 Kyle McMartin 2014-06-17 17:01:55 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=7052250

well, the failures in the python test-suite are not limited to aarch64. I see them on i686 and x86_64 as well.

Better still, an earlier test fails when you run it in mock --shell in the same way on aarch64 and x86_64.

Better-better still, this means it has nothing to do with TLS descriptors, and may be a generic TLS code generation bug in GCC, since x86_64 does not use them to access dynamic TLS symbols.

Comment 10 Kyle McMartin 2014-06-17 18:34:49 UTC
Building with CFLAGS="-O1" results in a working src/.libs/libcap-ng.so.0 so this looks like an optimization bug on AArch64... my current theory is that the reason the glibc version matters is that in the older glibc, we'd get a static TLS slot, which means we'd take the fast return path and probably avoid clobbering the register.

The X86_64 thing seems like a red herring at the moment... I'm working on reducing this to the appropriate gcc optimizer flag.

Comment 11 Kyle McMartin 2014-06-17 19:54:05 UTC
OK, building with -O2 -fno-schedule-insns -fno-schedule-insns2 appears to get things working again.