== Comment: #0 - Brent J. Baude <baude.com> - 2012-10-08 13:11:28 == The optimized ppc64p7 build of glibc-2.16-15.fc18 appears to have an issue with the loader. It will render a system unusable if installed. We didn't observe these problems in earlier test builds.
------- Comment From ryanarn.com 2012-10-08 19:11 EDT------- Backtrace in gdb: Program received signal SIGSEGV, Segmentation fault. 0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82 82 ((init_t) addrs[j]) (argc, argv, env); (gdb) bt #0 0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82 #1 0x000000004f769944 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=0x4f794930) at dl-init.c:53 #2 _dl_init (main_map=0x4f792bf0, argc=<optimized out>, argv=0xffffffff150, env=0xffffffff160) at dl-init.c:131 #3 0x000000004f75651c in ._dl_start_user () from /root/rpmbuild/BUILD/glibc-2.16-75f0d304/build-ppc64-redhat-linux/elf/ld64.so.1 (gdb) bt #0 0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82 #1 0x000000004f769944 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=0x4f794930) at dl-init.c:53 #2 _dl_init (main_map=0x4f792bf0, argc=<optimized out>, argv=0xffffffff150, env=0xffffffff160) at dl-init.c:131 #3 0x000000004f75651c in ._dl_start_user () from /root/rpmbuild/BUILD/glibc-2.16-75f0d304/build-ppc64-redhat-linux/elf/ld64.so.1
------- Comment From kamaleshb.com 2012-10-12 10:38 EDT------- *** Bug 85930 has been marked as a duplicate of this bug. ***
------- Comment From ryanarn.com 2012-10-17 23:25 EDT------- Building GLIBC with -O3 -fvect-cost-model -fno-tree-vectorize eliminates the segmentation violations. This is not THE solution but should point toward the problem.
------- Comment From ryanarn.com 2012-10-24 14:58 EDT------- This issue has been identified as a linker and compiler issue under -O3 optimization. For some reason the linker is inserting a NULL entry into libdl.so's init_array. When the loader walks this array and invokes the init functions it dereferences this NULL pointer and crashes. It's possible that the compiler is emitting some sort of bogus relocation that the linker then resolves into a null pointer. The following patch is a workaround in the loader code to not dereference null init_array entries. Ultimately this is NOT the correct solution: --- 1-glibc-2.16-75f0d304/elf/dl-init.c 2012-10-23 17:44:52.081314190 -0500 +++ 2-glibc-2.16-75f0d304/elf/dl-init.c 2012-10-23 20:32:30.501460972 -0500 @@ -79,7 +79,9 @@ addrs = (ElfW(Addr) *) (init_array->d_un.d_ptr + l->l_addr); for (j = 0; j < jm; ++j) - ((init_t) addrs[j]) (argc, argv, env); + /* Workaround linker bug which inserts a null entry at -O3. */ + if (addrs[j]) + ((init_t) addrs[j]) (argc, argv, env); } } From an objdump: Contents of section .init_array: 1fda0 00000000 00020050 00000000 00000000 .......P........ 1fdb0 00000000 000201a0 Curiously, the disassembly of the .init_array section doesn't show the null entry though that might be what the '...' indicates in the output: Disassembly of section .init_array: 000000000001fda0 <__frame_dummy_init_array_entry>: 1fda0: 00 00 00 00 .long 0x0 1fda0: R_PPC64_RELATIVE *ABS*+0x20050 1fda4: 00 02 00 50 .long 0x20050 ... 000000000001fdb0 <init_array>: 1fdb0: 00 00 00 00 .long 0x0 1fdb0: R_PPC64_RELATIVE *ABS*+0x201a0 1fdb4: 00 02 01 a0 .long 0x201a0 In a version of the library compiled with -O2 we don't see the null entry at all: Contents of section .init_array: 1fdb0 00000000 00020050 00000000 000201a0 .......P........ Disassembly of section .init_array: 000000000001fdb0 <__frame_dummy_init_array_entry>: 1fdb0: 00 00 00 00 .long 0x0 1fdb0: R_PPC64_RELATIVE *ABS*+0x20050 1fdb4: 00 02 00 50 .long 0x20050 000000000001fdb8 <init_array>: 1fdb8: 00 00 00 00 .long 0x0 1fdb8: R_PPC64_RELATIVE *ABS*+0x201a0 1fdbc: 00 02 01 a0 .long 0x201a0 I checked and there is not a bogus OPD entry in the failure case, just the dummy frame pointer amongst others and this, which is valid. 00000000000201a0 <init>: 201a0: 00 00 00 00 .long 0x0 201a0: R_PPC64_RELATIVE *ABS*+0x2bb0 201a4: 00 00 2b b0 .long 0x2bb0 201a8: 00 00 00 00 .long 0x0 201a8: R_PPC64_RELATIVE *ABS*+0x281b8 201ac: 00 02 81 b8 .long 0x281b8 I think much of this comes from dlfcn.os, but when I compare the good build to the bad build I see the exact same thing: Contents of section .init_array: 0000 00000000 00000000 Disassembly of section .text: 0000000000000000 <.init>: 0: e9 42 00 00 ld r10,0(r2) 2: R_PPC64_TOC16_DS .toc 4: e9 22 00 08 ld r9,8(r2) 6: R_PPC64_TOC16_DS .toc+0x8 8: 90 6a 00 00 stw r3,0(r10) c: f8 89 00 00 std r4,0(r9) 10: 4e 80 00 20 blr ... Disassembly of section .opd: 0000000000000000 <init>: ... 0: R_PPC64_ADDR64 .text 8: R_PPC64_TOC *ABS* Disassembly of section .init_array: 0000000000000000 <init_array>: ... 0: R_PPC64_ADDR64 .opd So at this point, I'm not sure what's going on.
Created attachment 632839 [details] faulty libdl.so.2 compiled with -O3 ------- Comment (attachment only) From ryanarn.com 2012-10-24 15:07 EDT-------
Created attachment 632840 [details] Good libdl.so.2 built with -O2 ------- Comment (attachment only) From ryanarn.com 2012-10-24 15:07 EDT-------
Created attachment 632841 [details] disassembly of bad_libdl.so.2 ------- Comment (attachment only) From ryanarn.com 2012-10-24 15:08 EDT-------
Created attachment 632843 [details] disassembly of good_libdl.so.2 ------- Comment (attachment only) From ryanarn.com 2012-10-24 15:09 EDT-------
------- Comment From ryanarn.com 2012-10-24 18:41 EDT------- Relinking libdl.so.2 with -Wl,-Map,foo.map indicates the following for the init_array linkage: .rela.init_array 0x00000000000006f0 0x18 linker stubs .init_array 0x000000000001fdd0 0x8 *(SORT(.init_array.*) SORT(.ctors.*)) *(.init_array) .init_array 0x000000000001fdd0 0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o *(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors)
Created attachment 632932 [details] relinked with -Wl,-Map,relink.map ------- Comment (attachment only) From ryanarn.com 2012-10-24 18:43 EDT-------
------- Comment From ryanarn.com 2012-10-24 22:00 EDT------- Comment on attachment 74738 failure case map file generated with -Wl,-Map,failure.map This map file was generated incorrectly.
------- Comment From ryanarn.com 2012-10-24 22:04 EDT------- The .init_array from the map file from the -O2 (good) build: .init_array 0x000000000001fdb0 0x10 *(SORT(.init_array.*) SORT(.ctors.*)) *(.init_array) .init_array 0x000000000001fdb0 0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o .init_array 0x000000000001fdb8 0x8 /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os) *(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors) The .init_array from the map file from the -O3 (failure) build: .init_array 0x000000000001fda0 0x18 *(SORT(.init_array.*) SORT(.ctors.*)) *(.init_array) .init_array 0x000000000001fda0 0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o *fill* 0x000000000001fda8 0x8 .init_array 0x000000000001fdb0 0x8 /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os) *(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors) So the '*fill*' entry seems to be problematic..
Created attachment 633055 [details] -Wl,-Map,bad.map for -O3 build ------- Comment (attachment only) From ryanarn.com 2012-10-24 22:06 EDT-------
Created attachment 633056 [details] -Wl,-Map,good.map for -O2 build ------- Comment (attachment only) From ryanarn.com 2012-10-24 22:06 EDT-------
------- Comment From amodra.com 2012-10-24 23:58 EDT------- Yes, that fill is why you have zeros in .init_array, and I'll bet the fill is there because /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os) .init_array section is improperly aligned to a 16-byte boundary. So check the object file section headers using readelf to verify my hypothesis, then you'll need to figure out why you're getting increased alignment on that object. Hmm, I see dlfcn.c on my (a little out of date) copy of glibc sources explicitly generates the .init_array entry aligned to sizeof (void *). That ought to be OK. Maybe a gcc bug?
------- Comment From bergner.com 2012-10-25 00:53 EDT------- Luckily, it's a very small function too. Ryan, can you attach the dlfnc.i file along with the two sets of options to build it (both bad and good)? Maybe also attach the two dlfcn.os files as well in case I have a hard time getting the same alignment.
Created attachment 633458 [details] -save-temps analysis for O2 and O3 builds, as well as object files, and readelf output. ------- Comment (attachment only) From ryanarn.com 2012-10-25 16:16 EDT-------
------- Comment From ryanarn.com 2012-10-25 16:23 EDT------- # diff -uNr bad_dlfcn.elf good_dlfcn.elf --- bad_dlfcn.elf 2012-10-25 15:51:00.733486361 -0500 +++ good_dlfcn.elf 2012-10-25 15:51:01.333486421 -0500 @@ -41,7 +41,7 @@ [ 8] .rela.opd RELA 0000000000000000 00001320 0000000000000030 0000000000000018 24 7 8 [ 9] .init_array INIT_ARRAY 0000000000000000 00000080 - 0000000000000008 0000000000000000 WA 0 0 16 + 0000000000000008 0000000000000000 WA 0 0 8 [10] .rela.init_array RELA 0000000000000000 00001350 0000000000000018 0000000000000018 24 9 8 [11] .debug_info PROGBITS 0000000000000000 00000088 # diff -uNr bad_dlfcn.s good_dlfcn.s --- bad_dlfcn.s 2012-10-25 15:51:01.283486416 -0500 +++ good_dlfcn.s 2012-10-25 15:51:01.853486473 -0500 @@ -42,7 +42,7 @@ .hidden __dlfcn_argc .comm __dlfcn_argc,4,4 .section .init_array,"aw" - .align 4 + .align 3 .type init_array, @object .size init_array, 8 init_array:
------- Comment From ryanarn.com 2012-10-25 16:35 EDT------- It's pretty clear that the init_array is being told to quadword align. I'm not sure why considering the .i files are the same for both builds: static void (*const init_array []) (int argc, char *argv[]) __attribute__ ((section (".init_array"), aligned (sizeof (void *)))) __attribute__ ((__used__)) = { init };
Created attachment 633580 [details] testcase ------- Comment on attachment From amodra.com 2012-10-25 22:55 EDT------- Compile with -m64 -O3 -mcpu=power7 -S to see the error. I see this on fc17 too with gcc (GCC) 4.7.0 20120525 (Red Hat 4.7.0-6)
------- Comment From bergner.com 2012-10-30 02:25 EDT------- This is http://gcc.gnu.org/PR53708 which is only fixed on mainline. I'll ask richi if he's willing to have this backported to the FSF 4.7 branch so the F18 toolchain can pick it up. BTW, the minimal test case is: static void (*const init_array []) (void) __attribute__ ((section (".init_array"), aligned (sizeof (void *)), used)) = { 0 }; and compiling with -m64 -O3 -maltivec is enough to see the over alignment.
------- Comment From ryanarn.com 2012-11-02 15:18 EDT------- This is a blocker for Fedora, so I do hope the fix is backported. I don't think the workaround I've provided in GLIBC is desirable for a production distribution.
------- Comment From bergner.com 2012-11-02 16:28 EDT------- I've committed the correct "fix" to gcc mainline and the GCC release managers want me to wait a couple of days before committing it to the FSF 4.7 branch. Since I committed the mainline patch yesterday, I'll commit the FSF 4.7 patch tomorrow.
------- Comment From bergner.com 2012-11-03 14:33 EDT------- Committed to the FSF 4.7 branch and merged into the IBM 4.7 branch, so fixed.
------- Comment From bergner.com 2012-11-05 15:21 EDT------- Committed to the FSF 4.7 branch as revision 193121 here: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00068.html
Jakub pulled the appropriate fix into the Fedora gcc; dwa spun new builds of gcc for ppc this morning. I've got fresh glibc builds for ppc spinning now.
------- Comment From clnperez.com 2012-11-28 17:02 EDT------- Did those builds finish and/or get tested yet?
Those builds finished and appear to be OK: http://ppc.koji.fedoraproject.org/koji/packageinfo?packageID=2068 Note -22.fc18 and -24.fc18 builds. jeff