elfutils-0.155-1 fails 5 tests in a sparc64 environment (Big Endian) -------------------------------------------------------------------- (This is just for tracking atm until I can wrap a debugger around each test and work through them and determine what they should ACTUALLY produce as output) --- readelf.out 2013-01-03 03:47:30.202707226 -0500 +++ - 2013-01-03 03:47:30.206443707 -0500 @@ -1 +1,8 @@ -./test-subr.sh: line 73: 63075 Bus error (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@" + +Index of archive 'testfile19.index' has 4 entries: +Archive member 'u1.o' contains: + a +Archive member 'u2.o' contains: + aa +Archive member 'u3.o' contains: + a FAIL: run-readelf-test4.sh -------------------------------------------------------------------------------- allregs: 63101: No such file or directory FAIL: run-native-test.sh -------------------------------------------------------------------------------- ./dwfl-bug-fd-leak: dwfl_linux_proc_report: No such file or directory FAIL: dwfl-bug-fd-leak -------------------------------------------------------------------------------- --- unstrip.out 2013-01-03 03:47:32.372707180 -0500 +++ - 2013-01-03 03:47:32.376872122 -0500 @@ -1 +1,6 @@ -./test-subr.sh: line 73: 63367 Bus error (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@" +0x10000000+0x20000 979b7a26747cc09bd84a42b311b5288c704baea5@0x10000174 . - [exe] +0x100000+0x10000 708b900b05176964512a6b0fe90c2a0c9d73d726@0x100334 . - linux-vdso32.so.1 +0xfd50000+0x30000 3f7d21508470322d2f47acddc20ab10516edba99@0xfd50164 /lib/librt.so.1 - librt.so.1 +0xfdf0000+0x1c0000 edf3dd232e09d01b90683889bd16b9406c52d4de@0xfdf0184 /lib/libc.so.6 - libc.so.6 +0xfdb0000+0x40000 f6ee91d4c629bc7dacc10534cb30056914e7e0b5@0xfdb0164 /lib/libpthread.so.0 - libpthread.so.0 +0xffb0000+0x50000 edec437a85026a1cf8cda94003706202733130c1@0xffb0124 /lib/ld.so.1 - ld.so.1 FAIL: run-unstrip-n.sh -------------------------------------------------------------------------------- --- readelf.out 2013-01-03 03:47:32.577707176 -0500 +++ - 2013-01-03 03:47:32.581283035 -0500 @@ -1 +1,11 @@ -./test-subr.sh: line 73: 63420 Bus error (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@" + +Index of archive 'testarchive64.a' has 7 entries: +Archive member 'aaa.o' contains: + aaa +Archive member 'bbb.o' contains: + bbb + bbb2 +Archive member 'ccc.o' contains: + ccc + ccc2 + ccc3 FAIL: run-test-archive64.sh Phil =--=
pfft,... ok run-native-test.sh will fail because it is expecting run-allregs.sh to run and leave a 'allregs' binary around,.. which would be nice expect it's INTEL SPECIFIC.... that needs to be skipped in run-native-test.sh when it's not running on an intel compatible box. Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ I've a horrible suspicion that it wasn't meant to be able to write any file at all The three remaining problems are all Bus errors so I'm hoping they're all commonly linked to one issue. With luck it'll be simple pointer math ignoring a BE endian environment issue.
(In reply to comment #1) > pfft,... ok run-native-test.sh will fail because it is expecting > run-allregs.sh to run and leave a 'allregs' binary around,.. which would be > nice expect it's INTEL SPECIFIC.... that needs to be skipped in > run-native-test.sh when it's not running on an intel compatible box. allregs should be a native binary (see tests/Makefile.am check_PROGRAMS).
(In reply to comment #1) > Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ It will test the dwfl_linux_proc_report (dwfl, pid) call which will try to open /proc/<pid>/maps which seems to fail for you for some reason.
(In reply to comment #1) > pfft,... ok run-native-test.sh will fail because it is expecting > run-allregs.sh to run and leave a 'allregs' binary around,.. which would be > nice expect it's INTEL SPECIFIC.... that needs to be skipped in > run-native-test.sh when it's not running on an intel compatible box. The binary is created by the build machinery, not by other test. It's also not Intel-specific. The message you are seeing comes from libdwfl and is likely due to one of /proc/$PID/{maps,mem,auxv} missing or being unreadable. Do those normally work on your installation? > Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ I've a > horrible suspicion that it wasn't meant to be able to write any file at all Same as above. > The three remaining problems are all Bus errors so I'm hoping they're all > commonly linked to one issue. With luck it'll be simple pointer math > ignoring a BE endian environment issue. It might be an unaligned access. elfutils works fine on s390 and PowerPC, which are both big endian machines under Linux.
(In reply to comment #3) > (In reply to comment #1) > > Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ > > It will test the dwfl_linux_proc_report (dwfl, pid) call which will try to > open /proc/<pid>/maps which seems to fail for you for some reason. Ah,.. that would be because I chrooted into the environment instead of used 'mock --shell' when I last ran the tests (--shell tends to get the termcap slightly wrong and I end up bailing out at inconvient moments as a result), so DEBUG util.py:307: Executing command: ['/bin/mount', '-n', '-t', 'proc', 'proc', '/var/lib/mock/fc18-rebuild/root/proc'] with env {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOSTNAME': 'mock', 'HOME': '/builddir', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin'} never ran and thus no /proc was available for the test. <mock-chroot>[root@localhost elfutils-0.155]# make check TESTS="dwfl-bug-fd-leak" ... PASS: dwfl-bug-fd-leak ============= 1 test passed ============= Ok, that one is down to myself. I'll see about adding a patch to check that /proc is actually mounted though.
(In reply to comment #2) > (In reply to comment #1) > > pfft,... ok run-native-test.sh will fail because it is expecting > > run-allregs.sh to run and leave a 'allregs' binary around,.. which would be > > nice expect it's INTEL SPECIFIC.... that needs to be skipped in > > run-native-test.sh when it's not running on an intel compatible box. > > allregs should be a native binary (see tests/Makefile.am check_PROGRAMS). Hurm seems to also require /proc mounted? <mock-chroot>[root@localhost elfutils-0.155]# make check TESTS="run-native-test.sh" ... PASS: run-native-test.sh ============= 1 test passed =============
(In reply to comment #0) > --- readelf.out 2013-01-03 03:47:30.202707226 -0500 Now that I've actually slept and can think i some small capacity,... set -xv in the script and reran so I can see whats being called where/when --- readelf.out 2013-01-03 12:31:21.502040278 -0500 +++ - 2013-01-03 12:31:21.505178510 -0500 @@ -1,4 +1,8 @@ -+ built_testrun ../src/readelf -c testfile19.index -+ LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm -+ ../src/readelf -c testfile19.index -./test-subr.sh: line 73: 11906 Bus error (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@" + +Index of archive 'testfile19.index' has 4 entries: +Archive member 'u1.o' contains: + a +Archive member 'u2.o' contains: + aa +Archive member 'u3.o' contains: + a FAIL: run-readelf-test4.sh --------------------------------------------------------------------- Lets stick gdb around that First lets get the testfile unpacked <mock-chroot>[root@localhost /]# cd /builddir/build/BUILD/elfutils-0.155/tests <mock-chroot>[root@localhost tests]# bzip2 -cd testfile19.index.bz2 > testfile19.index Lets just check that this is really where it's going wrong export LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" strace -f /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0+\0\0\0\1\0\0\0\0\0\0028@"..., 832) = 832 fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 2588256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xfffff8010047c000 mprotect(0xfffff801005ea000, 1048576, PROT_NONE) = 0 mmap(0xfffff801006ea000, 32768, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16e000) = 0xfffff801006ea000 mmap(0xfffff801006f2000, 7776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xfffff801006f2000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffff80100002000 mprotect(0xfffff801006ea000, 16384, PROT_READ) = 0 mprotect(0xfffff80100362000, 8192, PROT_READ) = 0 mprotect(0xfffff80100256000, 8192, PROT_READ) = 0 mprotect(0xfffff8010013a000, 8192, PROT_READ) = 0 mprotect(0x226000, 8192, PROT_READ) = 0 mprotect(0xfffff801184e8000, 8192, PROT_READ) = 0 munmap(0xfffff8010025c000, 13570) = 0 brk(0) = 0x22a000 brk(0x24c000) = 0x24c000 brk(0) = 0x24c000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 104822576, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff801006f4000 close(3) = 0 open("testfile19.index", O_RDONLY) = 3 fcntl(3, F_GETFL) = 0x40000 (flags O_RDONLY|O_LARGEFILE) fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 3152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff8010025c000 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010025c044} --- +++ killed by SIGBUS (core dumped) +++ Ok,.. looks like it opened the test file then set set the file attributes, then went off to call mmap()?.. then again that might just be a macro expansion that calls mmap, at which point it falls apart.. fine, lets try wrapping gdb around it. <mock-chroot>[root@localhost tests]# gdb ../src/readelf GNU gdb (GDB) Fedora (7.5.0.20120926-25.fc18) Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "sparc64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /builddir/build/BUILD/elfutils-0.155/src/readelf...done. (gdb) (gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" (gdb) show environment LD_LIBRARY_PATH LD_LIBRARY_PATH = "../libdw:../backends:../libelf:../libasm" (gdb) set args -c testfile19.index (gdb) show args Argument list to give program being debugged when it is started is "-c testfile19.index". (gdb) run Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] thread_get_info_callback: cannot get thread info: generic error Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-7.fc18.sparc64 elfutils-libs-0.152-1.fc12.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 xz-compat-libs-5.1.2-2alpha.fc18.sparc64 zlib-1.2.7-9.fc18.sparc64 (gdb) nurgh,... sec,. lemme get all those installed and restart. (bootstrap environments tend to be messy) Hurm,.. don;t have all the debuginfo',.. with luck this is enough 1:zlib-debuginfo-1.2.7-9.fc18 ################################# [ 25%] 2:xz-debuginfo-5.1.2-2alpha.fc18 ################################# [ 50%] 3:glibc-debuginfo-2.16.90-40.fc18 ################################# [ 75%] 4:bzip2-debuginfo-1.0.6-7.fc18 ################################# [100%] (gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" (gdb) set args -c testfile19.index (gdb) run Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] thread_get_info_callback: cannot get thread info: generic error Missing separate debuginfos, use: debuginfo-install elfutils-libs-0.152-1.el6.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 (gdb) where Target is executing. Gurr,.. threaded debugging,.. not working as expected (gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" (gdb) set args -c testfile19.index (gdb) break main Breakpoint 1 at 0x103140: file readelf.c, line 249. (gdb) run Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device] thread_get_info_callback: cannot get thread info: generic error Missing separate debuginfos, use: debuginfo-install elfutils-libs-0.152-1.el6.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 (gdb) where Target is executing. (gdb) thread 1 [Switching to thread 1 (Thread 0xfffff8010002ab90 (LWP 12146))](running) (gdb) where Target is executing. .... give up on gdb then 8/
Chasing using the old-school printf("XYZ\n"); method static void dump_archive_index (Elf *elf, const char *fname) { printf("2a\n"); size_t narsym; const Elf_Arsym *arsym = elf_getarsym (elf, &narsym); printf("2b\n"); (2b doesn't show,..) so,.. where is elf_getarsym() from?.. libelf/elf_getarsym.c by the look of it. In a way this might be a good thing as it would be shared with the other tests that failed. Assuming they all have the same issue. ------------------------------------------------------ libelf/elf_getarsym.c 3g 3m bash-4.2# 167 int w = index64_p ? 8 : 4; 168 printf("3g\n"); 169 170 /* We have an archive. The first word in there is the number of 171 entries in the table. */ 172 uint64_t n; 173 size_t off = elf->start_offset + SARMAG + sizeof (struct ar_hdr); 174 printf("3m\n"); 175 176 if (read_number_entries (&n, elf, &off, index64_p) < 0) 177 { 178 /* Cannot read the number of entries. */ 179 __libelf_seterrno (ELF_E_NO_INDEX); 180 printf("3q\n"); 181 goto out; 182 } 183 printf("3n\n"); 3q or n doesn't show up,.. follow into read_number_entries() ------------------------------------------------------ 49 static int 50 read_number_entries (uint64_t *nump, Elf *elf, size_t *offp, bool index64_p) 51 { 52 union u 53 { 54 uint64_t ret64; 55 uint32_t ret32; 56 } u; Oh,.. union 8/ I remember all kinds of warnings about unions on big endian machines... Ok lets get the show started, more printf()'s,.. 4a 4b 4c bash-4.2# printf("4a\n"); size_t w = index64_p ? 8 : 4; printf("4b\n"); if (elf->map_address != NULL) { printf("4c\n"); u = *(union u *) (elf->map_address + *offp); printf("4d\n"); Ok we arrive here,.. and this is apparently where bad things happen.. (eyes glazing over) Ummm,....
Created attachment 672206 [details] elf_getarsym patch for unaligned access Please try the attached patch and see if it helps.
printf("1a\n"); size_t w = index64_p ? 8 : 4; printf("1b\n"); if (elf->map_address != NULL) { printf("1c\n"); /* u = *(union u *) (elf->map_address + *offp); */ memcpy (&u, elf->map_address + *offp, sizeof u); printf("1d\n"); 1a 1b 1c 1d Ok lets remove the printf()s and run the test again ------------------------------------------ bash-4.2# (cd /builddir/build/BUILD/elfutils-0.155 ; make check TESTS="run-readelf-test4.sh") make[2]: Entering directory `/builddir/build/BUILD/elfutils-0.155/tests' PASS: run-readelf-test4.sh ============= 1 test passed ============= Hazzah!.. ok lets run the full suite and see if the other two shared the same fault. hellfire,.. no they're different 8/ ... -sigh- more printf debugging ... it's gonna be a long day Still one down isn't a bad thing 8) thanks. I'll keep digging here Phil =--=
Next up,.. lets follow through on the run-test-archive64.sh test LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" ../src/readelf -c testarchive64.a @@ -1,2 +1,11 @@ -Running LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" ../src/readelf -c testarchive64.a -./test-subr.sh: line 73: 53096 Bus error (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@" + +Index of archive 'testarchive64.a' has 7 entries: +Archive member 'aaa.o' contains: + aaa +Archive member 'bbb.o' contains: + bbb + bbb2 +Archive member 'ccc.o' contains: + ccc + ccc2 + ccc3 ----------------------------------------------------- unpack the testfile bash-4.2# cd /builddir/build/BUILD/elfutils-0.155/tests bash-4.2# bzip2 -cd testarchive64.a.bz2 > testarchive64.a check a run with strace for anything useful bash-4.2# export LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" bash-4.2# strace -f ../src/readelf -c testarchive64.a open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 104822576, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff801006f4000 close(3) = 0 open("testarchive64.a", O_RDONLY) = 3 fcntl(3, F_GETFL) = 0x40000 (flags O_RDONLY|O_LARGEFILE) fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 4360, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff8010025c000 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010025c04c} --- +++ killed by SIGBUS (core dumped) +++ Bus error (core dumped) Ok,.. time to go hunting wabbits. I note that this is still "../src/readelf -c" being run same as above for run-readelf-test4.sh. Lets see where it leads. actually the trace is pretty much identical except further along. 4k 4k2 4k2a bash-4.2# 209 if (elf->map_address == NULL) 210 { 211 file_data = alloca (sz); 212 213 ar_sym_len += index_size - n * w; 214 Elf_Arsym *newp = (Elf_Arsym *) realloc (elf->state.ar.ar_sym, 215 ar_sym_len); 216 if (newp == NULL) 217 { 218 free (elf->state.ar.ar_sym); 219 elf->state.ar.ar_sym = NULL; 220 __libelf_seterrno (ELF_E_NOMEM); 221 goto out; 222 } 223 elf->state.ar.ar_sym = newp; 224 225 char *new_str = (char *) (elf->state.ar.ar_sym + n + 1); 226 227 /* Now read the data from the file. */ 228 if ((size_t) pread_retry (elf->fildes, file_data, sz, off) != sz 229 || ((size_t) pread_retry (elf->fildes, new_str, 230 index_size - sz, off + sz) 231 != index_size - sz)) 232 { 233 /* We were not able to read the data. */ 234 free (elf->state.ar.ar_sym); 235 elf->state.ar.ar_sym = NULL; 236 __libelf_seterrno (ELF_E_NO_INDEX); 237 goto out; 238 } 239 240 str_data = (char *) new_str; 241 } 242 else 243 { --> 244 printf("4k\n"); 245 file_data = (void *) (elf->map_address + off); 246 str_data = (char *) (elf->map_address + off + sz); 247 } 248 249 /* Now we can build the data structure. */ 250 Elf_Arsym *arsym = elf->state.ar.ar_sym; 251 for (size_t cnt = 0; cnt < n; ++cnt) 252 { 253 arsym[cnt].as_name = str_data; 254 printf("4k2\n"); if (index64_p) 255 { 256 printf ("4k2a - cnt = %d \n", (int)cnt); **> 257 uint64_t tmp = file_data->u64[cnt]; 258 printf ("4k2b\n"); 259 if (__BYTE_ORDER == __LITTLE_ENDIAN) 260 tmp = bswap_64 (tmp); 261 printf ("4k2c\n"); 262 263 arsym[cnt].as_off = tmp; *bang*
Sorry,. I should have indicated more clearly that this is in libelf/elf_getarsym.c
Hurm,.. I'm looking at the run-unstrip-n.sh test it was able to chew though the testcore-rtlib file fine but abended while processing the testcore-rtlib-ppc corefile ----------------------------------------------------- bash-4.2# export LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm bash-4.2# ../src/unstrip -n --core=testcore-rtlib 0x8048000+0x2000 f1c600bc36cb91bf01f9a63a634ecb79aa4c3199@0x8048178 . - [exe] 0xf77d6000+0x1000 676560b1b765cde9c2e53f134f4ee354ea894747@0xf77d6210 . - linux-gate.so.1 0xf77b3000+0x9000 c6c5b5e35ab9589d4762ac85b4bd56b1b2720e37@0xf77b3164 /lib/librt.so.1 - librt.so.1 0xf7603000+0x1b0000 0b9bf374699e141e5dfc14757ff42b8c2373b4de@0xf7603184 /lib/libc.so.6 - libc.so.6 0xf75e9000+0x1a000 29a103420abe341e92072fb14274e250e4072148@0xf75e9164 /lib/libpthread.so.0 - libpthread.so.0 0xf77d7000+0x21000 6d2cb32650054f1c176d01d48713a4a5e5e84c1a@0xf77d7124 /lib/ld-linux.so.2 - ld-linux.so.2 ----------------------------------------------------- bash-4.2# ../src/unstrip -n --core=testcore-rtlib-ppc Bus error (core dumped) ----------------------------------------------------- Tracking,.. open("testcore-rtlib-ppc", O_RDONLY) = 3 fcntl(3, F_GETFL) = 0x40000 (flags O_RDONLY|O_LARGEFILE) fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 1376256, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xfffff80106aec000 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff80106aec41c} --- +++ killed by SIGBUS (core dumped) +++ Bus error (core dumped) Oh joy,.. meh I guess I should really use __line__, __file__, __func__ this time [mockbuild@localhost elfutils-0.155]$ (cd /builddir/build/BUILD/elfutils-0.155; make ; bzip2 -cd tests/testcore-rtlib-ppc.bz2 > tests/testcore-rtlib-ppc ; (cd tests; LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm ../src/unstrip -n --core=testcore-rtlib-ppc)) Line 2222 Line 2224 Line 2241 Line 2251 Line 2285 [mockbuild@localhost elfutils-0.155]$ 2285 printf("Line %d\n", __LINE__); 2286 int remaining; 2287 struct arg_info info = { .args = NULL }; 2288 error_t result = argp_parse (&argp, argc, argv, 0, &remaining, &info); 2289 printf("Line %d\n", __LINE__); Line 2251 Line 2285 Line 2288 The following assumes gdb isn't lying... (gdb) set environment LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm (gdb) set args -n --core=testcore-rtlib-ppc (gdb) break 2288 Breakpoint 1 at 0x1020d0: file unstrip.c, line 2288. (gdb) run Breakpoint 1, main (argc=<optimized out>, argv=0x7fefffff6a8) at unstrip.c:2289 2289 error_t result = argp_parse (&argp, argc, argv, 0, &remaining, &info); (gdb) set print pretty on (gdb) p argp $4 = { options = 0x108a70 <options>, parser = 0x1034a0 <parse_opt>, args_doc = 0x1081d8 "STRIPPED-FILE DEBUG-FILE\n[MODULE...]", doc = 0x108200 "Combine stripped files with separate symbols and debug information.\vThe first form puts the result in DEBUG-FILE if -o was not given.\n\nMODULE arguments give file name patterns matching modules to proc"..., children = 0x7fefffff2e0, help_filter = 0x0, argp_domain = 0x0 } (gdb) print info $8 = { output_file = 0x0, output_dir = 0x0, dwfl = 0x0, args = 0x0, list = false, all = false, ignore = false, modnames = false, match_files = false, relocate = false } (gdb) step Program received signal SIGBUS, Bus error. auxv_format_probe (elfdata=<optimized out>, elfclass=<optimized out>, size=<optimized out>, auxv=<optimized out>) at link_map.c:107 107 if (check64 (i)) (gdb) where #0 auxv_format_probe (elfdata=<optimized out>, elfclass=<optimized out>, size=<optimized out>, auxv=<optimized out>) at link_map.c:107 #1 dwfl_link_map_report (auxv=0xfffff8010081041c, auxv_size=200, memory_callback=memory_callback@entry=0xfffff80100269a20 <dwfl_elf_phdr_memory_callback>, memory_callback_arg=memory_callback_arg@entry=0x20c590) at link_map.c:614 #2 0xfffff8010026a244 in dwfl_core_file_report (dwfl=dwfl@entry=0x20c520, elf=0x20c590) at core-file.c:454 #3 0xfffff80100261b18 in parse_opt (key=<optimized out>, arg=0x7fefffff8cb "testcore-rtlib-ppc") at argp-std.c:207 #4 0xfffff801006965a0 in argp_parse () from /lib64/libc.so.6 #5 0x00000000001020ec in main (argc=<optimized out>, argv=0x7fefffff6a8) at unstrip.c:2289 (gdb) list 102 return false; 103 } 104 105 for (size_t i = 0; i < size / sizeof (Elf64_auxv_t); ++i) 106 { 107 if (check64 (i)) 108 { 109 *elfclass = ELFCLASS64; 110 return true; 111 } (gdb) print i $9 = 0 [mockbuild@localhost elfutils-0.155]$ vi libdwfl/link_map.c +107 60 const union 61 { 62 char buf[size]; 63 Elf32_auxv_t a32[size / sizeof (Elf32_auxv_t)]; 64 Elf64_auxv_t a64[size / sizeof (Elf64_auxv_t)]; 65 } *u = auxv; 66 67 inline bool check64 (size_t i) 68 { 69 if (u->a64[i].a_type == BE64 (PROBE_TYPE) 70 && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64)) 71 { 72 *elfdata = ELFDATA2MSB; 73 return true; 74 } 75 76 if (u->a64[i].a_type == LE64 (PROBE_TYPE) 77 && u->a64[i].a_un.a_val == LE64 (PROBE_VAL64)) 78 { 79 *elfdata = ELFDATA2LSB; 80 return true; 81 } 82 83 return false; 84 } Oh aye,.. here we go with unions again 8/
Ok having rebuilt the libdwfl as -O0 -g3 without FORTIFY_SOURCE and -fno-inline,.... -grumble- Program received signal SIGBUS, Bus error. 0xfffff801002736b0 in check64 (i=0) at link_map.c:70 70 if (u->a64[i].a_type == BE64 (PROBE_TYPE) Missing separate debuginfos, use: debuginfo-install glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 (gdb) set print pretty on (gdb) list 65 } *u = auxv; 66 67 /* inline bool check64 (size_t i) */ 68 bool check64 (size_t i) 69 { 70 if (u->a64[i].a_type == BE64 (PROBE_TYPE) 71 && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64)) 72 { 73 *elfdata = ELFDATA2MSB; 74 return true; (gdb) info locals u = 0xfffff8010082041c elfdata = 0x7feffffea57 "" (gdb) print u->a64[0] $2 = { a_type = 94489280534, a_un = { a_val = 94489280534 } } (gdb) info macro BE64 Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:43 #define BE64(x) (x) (gdb) info macro PROBE_TYPE Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:37 #define PROBE_TYPE AT_PHENT (gdb) info macro AT_PHENT Defined at ./../libelf/elf.h:953 included at ./../libelf/libelf.h:35 included at ./../libelf/gelf.h:32 included at ./../libdw/libdw.h:32 included at ./libdwfl.h:32 included at /builddir/build/BUILD/elfutils-0.155/libdwfl/libdwflP.h:35 included at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:30 #define AT_PHENT 4 Hurm guessing that thats not really a decimal value,.. switch to hex (gdb) print /x u->a64[i].a_type $4 = 0x1600000016 (gdb) print /x u->a64[i].a_un.a_val $5 = 0x1600000016 (gdb) info macro PROBE_VAL64 Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:39 #define PROBE_VAL64 sizeof (Elf64_Phdr) erk,. ok sooo whats the size of that? (gdb) print sizeof(Elf64_Phdr) $6 = 56 soooo that one line should demangle to if (0x1600000016 == 4 && 0x1600000016 == 56) = if ( 0 && 0 ) I don't get it,.. why does that produce a bus error? Program received signal SIGBUS, Bus error. 0xfffff801002736b0 in check64 (i=0) at link_map.c:70 70 if (u->a64[i].a_type == BE64 (PROBE_TYPE) (gdb) print /x u->a64[i].a_type $7 = 0x1600000016 (gdb) print /x BE64 (PROBE_TYPE) $8 = 0x4 (gdb) print /x u->a64[i].a_un.a_val $9 = 0x1600000016 (gdb) print BE64 (PROBE_VAL64) $10 = 56 ideas? anyone?
Condensed the above down to libdwfl/link_map.c:70 Program received signal SIGBUS, Bus error. 0xfffff801002736b0 in check64 (i=0) at link_map.c:70 70 if (u->a64[i].a_type == BE64 (PROBE_TYPE) Missing separate debuginfos, use: debuginfo-install glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 (gdb) print /x (u->a64[i].a_type == BE64 (PROBE_TYPE) && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64)) $1 = 0x0 (gdb) step Program terminated with signal SIGBUS, Bus error. The program no longer exists. .... I'm bemused/confused/terribly terribly frustrated and this creates a bus error HOW?
Oh, as a final thought, here is the disassembly/registers [mockbuild@localhost tests]$ ulimit -c unlimited ; LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm strace -f ../src/unstrip -n --core=testcore-rtlib-ppc ... mmap(NULL, 1376256, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xfffff80100704000 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010070441c} --- +++ killed by SIGBUS (core dumped) +++ Bus error (core dumped) [mockbuild@localhost tests]$ gdb ../src/unstrip core.37898 Core was generated by `../src/unstrip -n --core=testcore-rtlib-ppc'. Program terminated with signal 10, Bus error. #0 0xfffff801001536b0 in check64 (i=0) at link_map.c:69 69 if (u->a64[i].a_type == BE64 (PROBE_TYPE) (gdb) disassemble check64 Dump of assembler code for function check64: 0xfffff80100153698 <+0>: save %sp, -176, %sp 0xfffff8010015369c <+4>: stx %i0, [ %fp + 0x87f ] 0xfffff801001536a0 <+8>: mov %g5, %i5 0xfffff801001536a4 <+12>: ldx [ %i5 + 8 ], %g2 0xfffff801001536a8 <+16>: ldx [ %fp + 0x87f ], %g1 0xfffff801001536ac <+20>: sllx %g1, 4, %g1 => 0xfffff801001536b0 <+24>: ldx [ %g2 + %g1 ], %g1 0xfffff801001536b4 <+28>: cmp %g1, 4 0xfffff801001536b8 <+32>: bne %xcc, 0xfffff801001536f8 <check64+96> 0xfffff801001536bc <+36>: nop 0xfffff801001536c0 <+40>: ldx [ %i5 + 8 ], %g2 0xfffff801001536c4 <+44>: ldx [ %fp + 0x87f ], %g1 0xfffff801001536c8 <+48>: sllx %g1, 4, %g1 0xfffff801001536cc <+52>: add %g2, %g1, %g1 0xfffff801001536d0 <+56>: ldx [ %g1 + 8 ], %g1 0xfffff801001536d4 <+60>: cmp %g1, 0x38 0xfffff801001536d8 <+64>: bne %xcc, 0xfffff801001536f8 <check64+96> 0xfffff801001536dc <+68>: nop 0xfffff801001536e0 <+72>: ldx [ %i5 ], %g1 0xfffff801001536e4 <+76>: mov 2, %g2 0xfffff801001536e8 <+80>: stb %g2, [ %g1 ] 0xfffff801001536ec <+84>: mov 1, %g1 0xfffff801001536f0 <+88>: b %xcc, 0xfffff80100153770 <check64+216> 0xfffff801001536f4 <+92>: nop 0xfffff801001536f8 <+96>: ldx [ %i5 + 8 ], %g2 0xfffff801001536fc <+100>: ldx [ %fp + 0x87f ], %g1 0xfffff80100153700 <+104>: sllx %g1, 4, %g1 0xfffff80100153704 <+108>: ldx [ %g2 + %g1 ], %i4 0xfffff80100153708 <+112>: mov 4, %o0 0xfffff8010015370c <+116>: call 0xfffff8010015364c <__bswap_64> 0xfffff80100153710 <+120>: nop 0xfffff80100153714 <+124>: mov %o0, %g1 0xfffff80100153718 <+128>: cmp %i4, %g1 0xfffff8010015371c <+132>: bne %xcc, 0xfffff8010015376c <check64+212> 0xfffff80100153720 <+136>: nop 0xfffff80100153724 <+140>: ldx [ %i5 + 8 ], %g2 0xfffff80100153728 <+144>: ldx [ %fp + 0x87f ], %g1 0xfffff8010015372c <+148>: sllx %g1, 4, %g1 0xfffff80100153730 <+152>: add %g2, %g1, %g1 0xfffff80100153734 <+156>: ldx [ %g1 + 8 ], %i4 0xfffff80100153738 <+160>: mov 0x38, %o0 0xfffff8010015373c <+164>: call 0xfffff8010015364c <__bswap_64> 0xfffff80100153740 <+168>: nop 0xfffff80100153744 <+172>: mov %o0, %g1 0xfffff80100153748 <+176>: cmp %i4, %g1 0xfffff8010015374c <+180>: bne %xcc, 0xfffff8010015376c <check64+212> 0xfffff80100153750 <+184>: nop 0xfffff80100153754 <+188>: ldx [ %i5 ], %g1 0xfffff80100153758 <+192>: mov 1, %g2 0xfffff8010015375c <+196>: stb %g2, [ %g1 ] 0xfffff80100153760 <+200>: mov 1, %g1 0xfffff80100153764 <+204>: b %xcc, 0xfffff80100153770 <check64+216> 0xfffff80100153768 <+208>: nop 0xfffff8010015376c <+212>: clr %g1 ! 0x0 0xfffff80100153770 <+216>: and %g1, 0xff, %g1 0xfffff80100153774 <+220>: mov %g1, %i0 0xfffff80100153778 <+224>: rett %i7 + 8 0xfffff8010015377c <+228>: nop End of assembler dump. (gdb) info registers g0 0x0 0 g1 0x0 0 g2 0xfffff8010070441c -8791790697444 g3 0xc 12 g4 0xc8 200 g5 0x7feffda89b0 8791795599792 g6 0x454c465554494c53 4993443419147291731 g7 0xfffff801000030d0 -8791798042416 o0 0x0 0 o1 0x0 0 o2 0x0 0 o3 0x0 0 o4 0x0 0 o5 0x0 0 sp 0x7feffda8031 0x7feffda8031 o7 0x0 0 l0 0x0 0 l1 0x0 0 l2 0x0 0 l3 0x0 0 l4 0x0 0 l5 0x0 0 l6 0x0 0 l7 0x0 0 i0 0x0 0 i1 0x0 0 i2 0x0 0 i3 0x0 0 i4 0x0 0 i5 0x7feffda89b0 8791795599792 fp 0x7feffda80e1 0x7feffda80e1 i7 0xfffff80100153824 -8791796664284 pc 0xfffff801001536b0 0xfffff801001536b0 <check64+24> npc 0xfffff801001536b4 0xfffff801001536b4 <check64+28> state 0xf0001206 4026536454 fsr 0x0 [ ] fprs *value not available* y 0x0 0 cwp 0x6 6 pstate 0x12 [ IE PEF ] asi 0xf0 240 ccr 0x0 0
Created attachment 672431 [details] fix for unaligned accesses I got this reproduced on Itanium, which is sensitive to unaligned access as well (except whether it sends SIGBUS is configurable with prctl).
[mockbuild@localhost elfutils-0.155]$ make check TESTS="run-unstrip-n.sh" PASS: run-unstrip-n.sh ============= 1 test passed ============= And with that,... [mockbuild@localhost elfutils-0.155]$ make check =================== All 88 tests passed =================== Yea! finally we can all go back to sleep again. Thank you very much for you time and help Phil =--=
I've fixed all the unaligned access issues on the trunk. I tested it with a build done with -m64 on a sparc64 machine. We probably don't need to backport these fixes to Fedora packages, but can just make another release before too long instead. Mark can decide whether to just close this bug as NEXTRELEASE or to do the backports in the Fedora package. On the machine I have access to, there are some elflint-self failures that look like binutils bugs (all the same one bug). But that machine is not Fedora so I don't know if Fedora binutils produces any such problems.
(In reply to comment #19) > I've fixed all the unaligned access issues on the trunk. > I tested it with a build done with -m64 on a sparc64 machine. Thanks. BTW. Are sparc64 or ia64 the only arches on which these issues show up? > We probably don't need to backport these fixes to Fedora packages, but can > just make another release before too long instead. Mark can decide whether > to just close this bug as NEXTRELEASE or to do the backports in the Fedora > package. If the reporter wants I can add the patches to the rawhide package. Please just yell. > On the machine I have access to, there are some elflint-self failures that > look like binutils bugs (all the same one bug). But that machine is not > Fedora so I don't know if Fedora binutils produces any such problems. I am afraid this is upstream ld binutils: http://sourceware.org/bugzilla/show_bug.cgi?id=13621 Exposed when using gcc 4.7+ I believe. There used to be a fix for it by rth. Which was also backported to fedora. But it turned out that caused a different issue and so was then reverted again. The issue is that GNU ld (but not gold) removes "empty sections" even though there might be symbols for that section. rth's fix just marked such symbols as absolute (which apparently confused the linux kernel). I tried to create a patch keep track of symbols associated with sections so they wouldn't get discarded even when they were empty. But BFD makes that a lot of work and I haven't tracked down everywhere that this info needs to be passed around yet. We might have to add some extra hack to elflint --gnu-ld to allow "dangling symbols". But I couldn't think of a good way to detect them, since the symbol section that GNU ld assigns to these symbols is basically random (I believe it is just the old section number, but all sections are renumbered).
Would be nice, thanks. (rawhide) I've been running the patches out of tree for the past few days and haven't seen anything terribly wrong with elfutils/libs as yet. (Though there are loads of other fun issues I'm trying to beat down in other packages which rely on elfutils). Thanks. Phil =--=
(In reply to comment #20) > > On the machine I have access to, there are some elflint-self failures that > > look like binutils bugs (all the same one bug). But that machine is not > > Fedora so I don't know if Fedora binutils produces any such problems. > > I am afraid this is upstream ld binutils: > http://sourceware.org/bugzilla/show_bug.cgi?id=13621 > Exposed when using gcc 4.7+ I believe. > There used to be a fix for it by rth. Which was also backported to fedora. > But it turned out that caused a different issue and so was then reverted > again. > > The issue is that GNU ld (but not gold) removes "empty sections" even though [...] That is not the issue I saw on davem's sparc64 machine. What I saw was bogus .gnu.attributes sections, which is unrelated.
(In reply to comment #21) > Would be nice, thanks. (rawhide) > I've been running the patches out of tree for the past few days and haven't > seen anything terribly wrong with elfutils/libs as yet. (Though there are > loads of other fun issues I'm trying to beat down in other packages which > rely on elfutils). elfutils-0.155-2.fc19 has the patches. Don't know if/when the sparc koji picks it up.
Not for a while (I'm trying to get the fc18 buildroot stable to hand over to spot/dgilmore but ,.. yi yi yi,.. so many issues!) Phil (attempting to get the fedora core secondary archs - sparc stuff working) =--=