Bug 864138 - glibc-2.16-15.fc18.ppc64p7 renders system unusable
Summary: glibc-2.16-15.fc18.ppc64p7 renders system unusable
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 18
Hardware: ppc64
OS: All
unspecified
high
Target Milestone: ---
Assignee: Jeff Law
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F18Betappc
TreeView+ depends on / blocked
 
Reported: 2012-10-08 17:20 UTC by IBM Bug Proxy
Modified: 2016-11-24 15:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-11-06 21:30:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
faulty libdl.so.2 compiled with -O3 (157.11 KB, application/octet-stream)
2012-10-24 15:11 UTC, IBM Bug Proxy
no flags Details
Good libdl.so.2 built with -O2 (156.60 KB, application/octet-stream)
2012-10-24 15:11 UTC, IBM Bug Proxy
no flags Details
disassembly of bad_libdl.so.2 (1.30 MB, text/plain)
2012-10-24 15:12 UTC, IBM Bug Proxy
no flags Details
disassembly of good_libdl.so.2 (1.29 MB, text/plain)
2012-10-24 15:12 UTC, IBM Bug Proxy
no flags Details
relinked with -Wl,-Map,relink.map (12.86 KB, text/plain)
2012-10-24 18:52 UTC, IBM Bug Proxy
no flags Details
-Wl,-Map,bad.map for -O3 build (43.36 KB, text/plain)
2012-10-24 22:12 UTC, IBM Bug Proxy
no flags Details
-Wl,-Map,good.map for -O2 build (43.31 KB, text/plain)
2012-10-24 22:12 UTC, IBM Bug Proxy
no flags Details
-save-temps analysis for O2 and O3 builds, as well as object files, and readelf output. (103.02 KB, application/octet-stream)
2012-10-25 16:23 UTC, IBM Bug Proxy
no flags Details
testcase (200 bytes, text/x-csrc)
2012-10-25 23:02 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 85823 0 None None None 2012-10-08 17:20:36 UTC

Description IBM Bug Proxy 2012-10-08 17:20:35 UTC
== Comment: #0 - Brent J. Baude <baude.com> - 2012-10-08 13:11:28 ==
The optimized ppc64p7 build of glibc-2.16-15.fc18 appears to have an issue with the loader.  It will render a system unusable if installed.  We didn't observe these problems in earlier test builds.

Comment 1 IBM Bug Proxy 2012-10-08 19:20:45 UTC
------- Comment From ryanarn.com 2012-10-08 19:11 EDT-------
Backtrace in gdb:

Program received signal SIGSEGV, Segmentation fault.
0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82
82		((init_t) addrs[j]) (argc, argv, env);
(gdb) bt
#0  0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82
#1  0x000000004f769944 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=0x4f794930) at dl-init.c:53
#2  _dl_init (main_map=0x4f792bf0, argc=<optimized out>, argv=0xffffffff150, env=0xffffffff160) at dl-init.c:131
#3  0x000000004f75651c in ._dl_start_user () from /root/rpmbuild/BUILD/glibc-2.16-75f0d304/build-ppc64-redhat-linux/elf/ld64.so.1
(gdb) bt
#0  0x000000004f769810 in call_init (l=l@entry=0x4f794930, argc=argc@entry=1, argv=argv@entry=0xffffffff150, env=env@entry=0xffffffff160) at dl-init.c:82
#1  0x000000004f769944 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=0x4f794930) at dl-init.c:53
#2  _dl_init (main_map=0x4f792bf0, argc=<optimized out>, argv=0xffffffff150, env=0xffffffff160) at dl-init.c:131
#3  0x000000004f75651c in ._dl_start_user () from /root/rpmbuild/BUILD/glibc-2.16-75f0d304/build-ppc64-redhat-linux/elf/ld64.so.1

Comment 2 IBM Bug Proxy 2012-10-12 10:40:25 UTC
------- Comment From kamaleshb.com 2012-10-12 10:38 EDT-------
*** Bug 85930 has been marked as a duplicate of this bug. ***

Comment 3 IBM Bug Proxy 2012-10-17 23:30:52 UTC
------- Comment From ryanarn.com 2012-10-17 23:25 EDT-------
Building GLIBC with -O3 -fvect-cost-model -fno-tree-vectorize eliminates the segmentation violations.  This is not THE solution but should point toward the problem.

Comment 4 IBM Bug Proxy 2012-10-24 15:00:30 UTC
------- Comment From ryanarn.com 2012-10-24 14:58 EDT-------
This issue has been identified as a linker and compiler issue under -O3 optimization.

For some reason the linker is inserting a NULL entry into libdl.so's init_array.  When the loader walks this array and invokes the init functions it dereferences this NULL pointer and crashes.  It's possible that the compiler is emitting some sort of bogus relocation that the linker then resolves into a null pointer.

The following patch is a workaround in the loader code to not dereference null init_array entries.  Ultimately this is NOT the correct solution:

--- 1-glibc-2.16-75f0d304/elf/dl-init.c 2012-10-23 17:44:52.081314190 -0500
+++ 2-glibc-2.16-75f0d304/elf/dl-init.c 2012-10-23 20:32:30.501460972 -0500
@@ -79,7 +79,9 @@

addrs = (ElfW(Addr) *) (init_array->d_un.d_ptr + l->l_addr);
for (j = 0; j < jm; ++j)
-       ((init_t) addrs[j]) (argc, argv, env);
+       /* Workaround linker bug which inserts a null entry at -O3.  */
+       if (addrs[j])
+         ((init_t) addrs[j]) (argc, argv, env);
}
}

From an objdump:

Contents of section .init_array:
1fda0 00000000 00020050 00000000 00000000  .......P........
1fdb0 00000000 000201a0

Curiously, the disassembly of the .init_array section doesn't show the null entry though that might be what the '...' indicates in the output:

Disassembly of section .init_array:

000000000001fda0 <__frame_dummy_init_array_entry>:
1fda0:       00 00 00 00     .long 0x0
1fda0: R_PPC64_RELATIVE *ABS*+0x20050
1fda4:       00 02 00 50     .long 0x20050
...

000000000001fdb0 <init_array>:
1fdb0:       00 00 00 00     .long 0x0
1fdb0: R_PPC64_RELATIVE *ABS*+0x201a0
1fdb4:       00 02 01 a0     .long 0x201a0

In a version of the library compiled with -O2 we don't see the null entry at all:

Contents of section .init_array:
1fdb0 00000000 00020050 00000000 000201a0  .......P........

Disassembly of section .init_array:

000000000001fdb0 <__frame_dummy_init_array_entry>:
1fdb0:       00 00 00 00     .long 0x0
1fdb0: R_PPC64_RELATIVE *ABS*+0x20050
1fdb4:       00 02 00 50     .long 0x20050

000000000001fdb8 <init_array>:
1fdb8:       00 00 00 00     .long 0x0
1fdb8: R_PPC64_RELATIVE *ABS*+0x201a0
1fdbc:       00 02 01 a0     .long 0x201a0

I checked and there is not a bogus OPD entry in the failure case, just the dummy frame pointer amongst others and this, which is valid.

00000000000201a0 <init>:
201a0:       00 00 00 00     .long 0x0
201a0: R_PPC64_RELATIVE *ABS*+0x2bb0
201a4:       00 00 2b b0     .long 0x2bb0
201a8:       00 00 00 00     .long 0x0
201a8: R_PPC64_RELATIVE *ABS*+0x281b8
201ac:       00 02 81 b8     .long 0x281b8

I think much of this comes from dlfcn.os, but when I compare the good build to the bad build I see the exact same thing:

Contents of section .init_array:
0000 00000000 00000000

Disassembly of section .text:

0000000000000000 <.init>:
0:   e9 42 00 00     ld      r10,0(r2)
2: R_PPC64_TOC16_DS     .toc
4:   e9 22 00 08     ld      r9,8(r2)
6: R_PPC64_TOC16_DS     .toc+0x8
8:   90 6a 00 00     stw     r3,0(r10)
c:   f8 89 00 00     std     r4,0(r9)
10:   4e 80 00 20     blr
...

Disassembly of section .opd:

0000000000000000 <init>:
...
0: R_PPC64_ADDR64       .text
8: R_PPC64_TOC  *ABS*

Disassembly of section .init_array:

0000000000000000 <init_array>:
...
0: R_PPC64_ADDR64       .opd

So at this point, I'm not sure what's going on.

Comment 5 IBM Bug Proxy 2012-10-24 15:11:25 UTC
Created attachment 632839 [details]
faulty libdl.so.2 compiled with -O3


------- Comment (attachment only) From ryanarn.com 2012-10-24 15:07 EDT-------

Comment 6 IBM Bug Proxy 2012-10-24 15:11:42 UTC
Created attachment 632840 [details]
Good libdl.so.2 built with -O2


------- Comment (attachment only) From ryanarn.com 2012-10-24 15:07 EDT-------

Comment 7 IBM Bug Proxy 2012-10-24 15:12:00 UTC
Created attachment 632841 [details]
disassembly of bad_libdl.so.2


------- Comment (attachment only) From ryanarn.com 2012-10-24 15:08 EDT-------

Comment 8 IBM Bug Proxy 2012-10-24 15:12:23 UTC
Created attachment 632843 [details]
disassembly of good_libdl.so.2


------- Comment (attachment only) From ryanarn.com 2012-10-24 15:09 EDT-------

Comment 9 IBM Bug Proxy 2012-10-24 18:52:07 UTC
------- Comment From ryanarn.com 2012-10-24 18:41 EDT-------
Relinking libdl.so.2 with -Wl,-Map,foo.map indicates the following for the init_array linkage:

.rela.init_array
0x00000000000006f0       0x18 linker stubs

.init_array     0x000000000001fdd0        0x8
*(SORT(.init_array.*) SORT(.ctors.*))
*(.init_array)
.init_array    0x000000000001fdd0        0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o
*(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors)

Comment 10 IBM Bug Proxy 2012-10-24 18:52:21 UTC
Created attachment 632932 [details]
relinked with -Wl,-Map,relink.map


------- Comment (attachment only) From ryanarn.com 2012-10-24 18:43 EDT-------

Comment 11 IBM Bug Proxy 2012-10-24 22:02:20 UTC
------- Comment From ryanarn.com 2012-10-24 22:00 EDT-------
Comment on attachment 74738
failure case map file generated with -Wl,-Map,failure.map

This map file was generated incorrectly.

Comment 12 IBM Bug Proxy 2012-10-24 22:12:20 UTC
------- Comment From ryanarn.com 2012-10-24 22:04 EDT-------
The .init_array from the map file from the -O2 (good) build:

.init_array     0x000000000001fdb0       0x10
*(SORT(.init_array.*) SORT(.ctors.*))
*(.init_array)
.init_array    0x000000000001fdb0        0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o
.init_array    0x000000000001fdb8        0x8 /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os)
*(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors)

The .init_array from the map file from the -O3 (failure) build:

.init_array     0x000000000001fda0       0x18
*(SORT(.init_array.*) SORT(.ctors.*))
*(.init_array)
.init_array    0x000000000001fda0        0x8 /usr/lib/gcc/ppc64-redhat-linux/4.7.2/crtbeginS.o
*fill*         0x000000000001fda8        0x8
.init_array    0x000000000001fdb0        0x8 /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os)
*(EXCLUDE_FILE(*crtend?.o *crtend.o *crtbegin?.o *crtbegin.o) .ctors)

So the '*fill*' entry seems to be problematic..

Comment 13 IBM Bug Proxy 2012-10-24 22:12:35 UTC
Created attachment 633055 [details]
-Wl,-Map,bad.map for -O3 build


------- Comment (attachment only) From ryanarn.com 2012-10-24 22:06 EDT-------

Comment 14 IBM Bug Proxy 2012-10-24 22:12:52 UTC
Created attachment 633056 [details]
-Wl,-Map,good.map for -O2 build


------- Comment (attachment only) From ryanarn.com 2012-10-24 22:06 EDT-------

Comment 15 IBM Bug Proxy 2012-10-25 00:02:32 UTC
------- Comment From amodra.com 2012-10-24 23:58 EDT-------
Yes, that fill is why you have zeros in .init_array, and I'll bet the fill is there because /root/rpmbuild/BUILD/glibc-2.16-75f0d304/ryanarnbuild/dlfcn/libdl_pic.a(dlfcn.os) .init_array section is improperly aligned to a 16-byte boundary.  So check the object file section headers using readelf to verify my hypothesis, then you'll need to figure out why you're getting increased alignment on that object.  Hmm, I see dlfcn.c on my (a little out of date) copy of glibc sources explicitly generates the .init_array entry aligned to sizeof (void *).  That ought to be OK.  Maybe a gcc bug?

Comment 16 IBM Bug Proxy 2012-10-25 01:02:38 UTC
------- Comment From bergner.com 2012-10-25 00:53 EDT-------
Luckily, it's a very small function too.  Ryan, can you attach the dlfnc.i file along with the two sets of options to build it (both bad and good)?  Maybe also attach the two dlfcn.os files as well in case I have a hard time getting the same alignment.

Comment 17 IBM Bug Proxy 2012-10-25 16:23:24 UTC
Created attachment 633458 [details]
-save-temps analysis for O2 and O3 builds, as well as object files, and readelf output.


------- Comment (attachment only) From ryanarn.com 2012-10-25 16:16 EDT-------

Comment 18 IBM Bug Proxy 2012-10-25 16:32:38 UTC
------- Comment From ryanarn.com 2012-10-25 16:23 EDT-------
# diff -uNr bad_dlfcn.elf good_dlfcn.elf
--- bad_dlfcn.elf	2012-10-25 15:51:00.733486361 -0500
+++ good_dlfcn.elf	2012-10-25 15:51:01.333486421 -0500
@@ -41,7 +41,7 @@
[ 8] .rela.opd         RELA             0000000000000000  00001320
0000000000000030  0000000000000018          24     7     8
[ 9] .init_array       INIT_ARRAY       0000000000000000  00000080
-       0000000000000008  0000000000000000  WA       0     0     16
+       0000000000000008  0000000000000000  WA       0     0     8
[10] .rela.init_array  RELA             0000000000000000  00001350
0000000000000018  0000000000000018          24     9     8
[11] .debug_info       PROGBITS         0000000000000000  00000088

# diff -uNr bad_dlfcn.s good_dlfcn.s
--- bad_dlfcn.s	2012-10-25 15:51:01.283486416 -0500
+++ good_dlfcn.s	2012-10-25 15:51:01.853486473 -0500
@@ -42,7 +42,7 @@
.hidden	__dlfcn_argc
.comm	__dlfcn_argc,4,4
.section	.init_array,"aw"
-	.align 4
+	.align 3
.type	init_array, @object
.size	init_array, 8
init_array:

Comment 19 IBM Bug Proxy 2012-10-25 16:42:43 UTC
------- Comment From ryanarn.com 2012-10-25 16:35 EDT-------
It's pretty clear that the init_array is being told to quadword align.  I'm not sure why considering the .i files are the same for both builds:

static void (*const init_array []) (int argc, char *argv[])
__attribute__ ((section (".init_array"), aligned (sizeof (void *))))
__attribute__ ((__used__)) =
{
init
};

Comment 20 IBM Bug Proxy 2012-10-25 23:02:50 UTC
Created attachment 633580 [details]
testcase


------- Comment on attachment From amodra.com 2012-10-25 22:55 EDT-------


Compile with -m64 -O3 -mcpu=power7 -S to see the error.  I see this on fc17 too with
gcc (GCC) 4.7.0 20120525 (Red Hat 4.7.0-6)

Comment 21 IBM Bug Proxy 2012-10-30 02:32:38 UTC
------- Comment From bergner.com 2012-10-30 02:25 EDT-------
This is http://gcc.gnu.org/PR53708 which is only fixed on mainline.  I'll ask richi if he's willing to have this backported to the FSF 4.7 branch so the F18 toolchain can pick it up.

BTW, the minimal test case is:

static void (*const init_array []) (void)
__attribute__ ((section (".init_array"), aligned (sizeof (void *)), used)) = { 0 };

and compiling with -m64 -O3 -maltivec is enough to see the over alignment.

Comment 22 IBM Bug Proxy 2012-11-02 15:23:01 UTC
------- Comment From ryanarn.com 2012-11-02 15:18 EDT-------
This is a blocker for Fedora, so I do hope the fix is backported.  I don't think the workaround I've provided in GLIBC is desirable for a production distribution.

Comment 23 IBM Bug Proxy 2012-11-02 16:32:33 UTC
------- Comment From bergner.com 2012-11-02 16:28 EDT-------
I've committed the correct "fix" to gcc mainline and the GCC release managers want me to wait a couple of days before committing it to the FSF 4.7 branch.  Since I committed the mainline patch yesterday, I'll commit the FSF 4.7 patch tomorrow.

Comment 24 IBM Bug Proxy 2012-11-03 14:43:29 UTC
------- Comment From bergner.com 2012-11-03 14:33 EDT-------
Committed to the FSF 4.7 branch and merged into the IBM 4.7 branch, so fixed.

Comment 25 IBM Bug Proxy 2012-11-05 15:33:49 UTC
------- Comment From bergner.com 2012-11-05 15:21 EDT-------
Committed to the FSF 4.7 branch as revision 193121 here:

http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00068.html

Comment 26 Jeff Law 2012-11-06 21:30:12 UTC
Jakub pulled the appropriate fix into the Fedora gcc; dwa spun new builds of gcc for ppc this morning.  I've got fresh glibc builds for ppc spinning now.

Comment 27 IBM Bug Proxy 2012-11-28 17:14:07 UTC
------- Comment From clnperez.com 2012-11-28 17:02 EDT-------
Did those builds finish and/or get tested yet?

Comment 28 Jeff Law 2012-11-29 18:08:49 UTC
Those builds finished and appear to be OK:

http://ppc.koji.fedoraproject.org/koji/packageinfo?packageID=2068

Note -22.fc18 and -24.fc18 builds.   

jeff


Note You need to log in before you can comment on or make changes to this bug.