Bug 891553 - Failures in the self tests (some quite serious)
Summary: Failures in the self tests (some quite serious)
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: elfutils
Version: 18
Hardware: sparc64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-03 09:05 UTC by Bryce
Modified: 2013-01-10 11:41 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-08 09:03:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
elf_getarsym patch for unaligned access (497 bytes, patch)
2013-01-03 20:36 UTC, Roland McGrath
no flags Details | Diff
fix for unaligned accesses (2.89 KB, patch)
2013-01-04 15:21 UTC, Petr Machata
no flags Details | Diff

Description Bryce 2013-01-03 09:05:03 UTC
elfutils-0.155-1 fails 5 tests in a sparc64 environment (Big Endian)
--------------------------------------------------------------------
(This is just for tracking atm until I can wrap a debugger around each test and work through them and determine what they should ACTUALLY produce as output)


--- readelf.out 2013-01-03 03:47:30.202707226 -0500
+++ -   2013-01-03 03:47:30.206443707 -0500
@@ -1 +1,8 @@
-./test-subr.sh: line 73: 63075 Bus error               (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@"
+
+Index of archive 'testfile19.index' has 4 entries:
+Archive member 'u1.o' contains:
+       a
+Archive member 'u2.o' contains:
+       aa
+Archive member 'u3.o' contains:
+       a
FAIL: run-readelf-test4.sh

--------------------------------------------------------------------------------

allregs: 63101: No such file or directory
FAIL: run-native-test.sh

--------------------------------------------------------------------------------

./dwfl-bug-fd-leak: dwfl_linux_proc_report: No such file or directory
FAIL: dwfl-bug-fd-leak

--------------------------------------------------------------------------------

--- unstrip.out 2013-01-03 03:47:32.372707180 -0500
+++ -   2013-01-03 03:47:32.376872122 -0500
@@ -1 +1,6 @@
-./test-subr.sh: line 73: 63367 Bus error               (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@"
+0x10000000+0x20000 979b7a26747cc09bd84a42b311b5288c704baea5@0x10000174 . - [exe]
+0x100000+0x10000 708b900b05176964512a6b0fe90c2a0c9d73d726@0x100334 . - linux-vdso32.so.1
+0xfd50000+0x30000 3f7d21508470322d2f47acddc20ab10516edba99@0xfd50164 /lib/librt.so.1 - librt.so.1
+0xfdf0000+0x1c0000 edf3dd232e09d01b90683889bd16b9406c52d4de@0xfdf0184 /lib/libc.so.6 - libc.so.6
+0xfdb0000+0x40000 f6ee91d4c629bc7dacc10534cb30056914e7e0b5@0xfdb0164 /lib/libpthread.so.0 - libpthread.so.0
+0xffb0000+0x50000 edec437a85026a1cf8cda94003706202733130c1@0xffb0124 /lib/ld.so.1 - ld.so.1
FAIL: run-unstrip-n.sh

--------------------------------------------------------------------------------

--- readelf.out 2013-01-03 03:47:32.577707176 -0500
+++ -   2013-01-03 03:47:32.581283035 -0500
@@ -1 +1,11 @@
-./test-subr.sh: line 73: 63420 Bus error               (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@"
+
+Index of archive 'testarchive64.a' has 7 entries:
+Archive member 'aaa.o' contains:
+       aaa
+Archive member 'bbb.o' contains:
+       bbb
+       bbb2
+Archive member 'ccc.o' contains:
+       ccc
+       ccc2
+       ccc3
FAIL: run-test-archive64.sh

Phil
=--=

Comment 1 Bryce 2013-01-03 09:21:57 UTC
pfft,... ok run-native-test.sh will fail because it is expecting run-allregs.sh to run and leave a 'allregs' binary around,.. which would be nice expect it's INTEL SPECIFIC.... that needs to be skipped in run-native-test.sh when it's not running on an intel compatible box.

Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ I've a horrible suspicion that it wasn't meant to be able to write any file at all

The three remaining problems are all Bus errors so I'm hoping they're all commonly linked to one issue. With luck it'll be simple pointer math ignoring a BE endian environment issue.

Comment 2 Mark Wielaard 2013-01-03 10:40:37 UTC
(In reply to comment #1)
> pfft,... ok run-native-test.sh will fail because it is expecting
> run-allregs.sh to run and leave a 'allregs' binary around,.. which would be
> nice expect it's INTEL SPECIFIC.... that needs to be skipped in
> run-native-test.sh when it's not running on an intel compatible box.

allregs should be a native binary (see tests/Makefile.am check_PROGRAMS).

Comment 3 Mark Wielaard 2013-01-03 10:48:32 UTC
(In reply to comment #1)
> Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/

It will test the dwfl_linux_proc_report (dwfl, pid) call which will try to open /proc/<pid>/maps which seems to fail for you for some reason.

Comment 4 Petr Machata 2013-01-03 11:16:18 UTC
(In reply to comment #1)
> pfft,... ok run-native-test.sh will fail because it is expecting
> run-allregs.sh to run and leave a 'allregs' binary around,.. which would be
> nice expect it's INTEL SPECIFIC.... that needs to be skipped in
> run-native-test.sh when it's not running on an intel compatible box.

The binary is created by the build machinery, not by other test.  It's also not Intel-specific.

The message you are seeing comes from libdwfl and is likely due to one of /proc/$PID/{maps,mem,auxv} missing or being unreadable.  Do those normally work on your installation?

> Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/ I've a
> horrible suspicion that it wasn't meant to be able to write any file at all

Same as above.

> The three remaining problems are all Bus errors so I'm hoping they're all
> commonly linked to one issue. With luck it'll be simple pointer math
> ignoring a BE endian environment issue.

It might be an unaligned access.  elfutils works fine on s390 and PowerPC, which are both big endian machines under Linux.

Comment 5 Bryce 2013-01-03 17:23:55 UTC
(In reply to comment #3)
> (In reply to comment #1)
> > Not entirely sure what way dwfl-bug-fd-leak is meant to work 8/
> 
> It will test the dwfl_linux_proc_report (dwfl, pid) call which will try to
> open /proc/<pid>/maps which seems to fail for you for some reason.

Ah,.. that would be because I chrooted into the environment instead of used 'mock --shell' when I last ran the tests (--shell tends to get the termcap slightly wrong and I end up bailing out at inconvient moments as a result), so
DEBUG util.py:307:  Executing command: ['/bin/mount', '-n', '-t', 'proc', 'proc', '/var/lib/mock/fc18-rebuild/root/proc'] with env {'LANG': 'en_US.UTF-8', 'TERM': 'vt100', 'SHELL': '/bin/bash', 'HOSTNAME': 'mock', 'HOME': '/builddir', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin'}
never ran and thus no /proc was available for the test.

<mock-chroot>[root@localhost elfutils-0.155]# make check TESTS="dwfl-bug-fd-leak"
...
PASS: dwfl-bug-fd-leak
=============
1 test passed
=============

Ok, that one is down to myself. I'll see about adding a patch to check that /proc is actually mounted though.

Comment 6 Bryce 2013-01-03 17:26:26 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > pfft,... ok run-native-test.sh will fail because it is expecting
> > run-allregs.sh to run and leave a 'allregs' binary around,.. which would be
> > nice expect it's INTEL SPECIFIC.... that needs to be skipped in
> > run-native-test.sh when it's not running on an intel compatible box.
> 
> allregs should be a native binary (see tests/Makefile.am check_PROGRAMS).

Hurm seems to also require /proc mounted?

<mock-chroot>[root@localhost elfutils-0.155]# make check TESTS="run-native-test.sh"
...
PASS: run-native-test.sh
=============
1 test passed
=============

Comment 7 Bryce 2013-01-03 18:27:50 UTC
(In reply to comment #0)

> --- readelf.out 2013-01-03 03:47:30.202707226 -0500

Now that I've actually slept and can think i some small capacity,...

set -xv in the script and reran so I can see whats being called where/when

--- readelf.out 2013-01-03 12:31:21.502040278 -0500
+++ -   2013-01-03 12:31:21.505178510 -0500
@@ -1,4 +1,8 @@
-+ built_testrun ../src/readelf -c testfile19.index
-+ LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm
-+ ../src/readelf -c testfile19.index
-./test-subr.sh: line 73: 11906 Bus error               (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@"
+
+Index of archive 'testfile19.index' has 4 entries:
+Archive member 'u1.o' contains:
+       a
+Archive member 'u2.o' contains:
+       aa
+Archive member 'u3.o' contains:
+       a
FAIL: run-readelf-test4.sh

---------------------------------------------------------------------
Lets stick gdb around that

First lets get the testfile unpacked

<mock-chroot>[root@localhost /]# cd /builddir/build/BUILD/elfutils-0.155/tests
<mock-chroot>[root@localhost tests]# bzip2 -cd testfile19.index.bz2 > testfile19.index


Lets just check that this is really where it's going wrong
export LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm"
strace -f /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index

open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0+\0\0\0\1\0\0\0\0\0\0028@"..., 832) = 832
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 2588256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xfffff8010047c000
mprotect(0xfffff801005ea000, 1048576, PROT_NONE) = 0
mmap(0xfffff801006ea000, 32768, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16e000) = 0xfffff801006ea000
mmap(0xfffff801006f2000, 7776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xfffff801006f2000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffff80100002000
mprotect(0xfffff801006ea000, 16384, PROT_READ) = 0
mprotect(0xfffff80100362000, 8192, PROT_READ) = 0
mprotect(0xfffff80100256000, 8192, PROT_READ) = 0
mprotect(0xfffff8010013a000, 8192, PROT_READ) = 0
mprotect(0x226000, 8192, PROT_READ)     = 0
mprotect(0xfffff801184e8000, 8192, PROT_READ) = 0
munmap(0xfffff8010025c000, 13570)       = 0
brk(0)                                  = 0x22a000
brk(0x24c000)                           = 0x24c000
brk(0)                                  = 0x24c000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 104822576, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff801006f4000
close(3)                                = 0
open("testfile19.index", O_RDONLY)      = 3
fcntl(3, F_GETFL)                       = 0x40000 (flags O_RDONLY|O_LARGEFILE)
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 3152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff8010025c000
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010025c044} ---
+++ killed by SIGBUS (core dumped) +++


Ok,.. looks like it opened the test file then set set the file attributes,  then went off to call mmap()?.. then again that might just be a macro expansion that calls mmap, at which point it falls apart.. fine, lets try wrapping gdb around it.


<mock-chroot>[root@localhost tests]# gdb ../src/readelf
GNU gdb (GDB) Fedora (7.5.0.20120926-25.fc18)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /builddir/build/BUILD/elfutils-0.155/src/readelf...done.
(gdb) 
(gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm"
(gdb) show environment LD_LIBRARY_PATH
LD_LIBRARY_PATH = "../libdw:../backends:../libelf:../libasm"
(gdb) set args -c testfile19.index
(gdb) show args
Argument list to give program being debugged when it is started is "-c testfile19.index".
(gdb) run
Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
thread_get_info_callback: cannot get thread info: generic error
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-7.fc18.sparc64 elfutils-libs-0.152-1.fc12.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64 xz-compat-libs-5.1.2-2alpha.fc18.sparc64 zlib-1.2.7-9.fc18.sparc64
(gdb) 



nurgh,... sec,. lemme get all those installed and restart. (bootstrap environments tend to be messy)
Hurm,..  don;t have all the debuginfo',.. with luck this is enough
   1:zlib-debuginfo-1.2.7-9.fc18      ################################# [ 25%]
   2:xz-debuginfo-5.1.2-2alpha.fc18   ################################# [ 50%]
   3:glibc-debuginfo-2.16.90-40.fc18  ################################# [ 75%]
   4:bzip2-debuginfo-1.0.6-7.fc18     ################################# [100%]


(gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm"
(gdb) set args -c testfile19.index
(gdb) run
Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
thread_get_info_callback: cannot get thread info: generic error
Missing separate debuginfos, use: debuginfo-install elfutils-libs-0.152-1.el6.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64
(gdb) where
Target is executing.

Gurr,.. threaded debugging,.. not working as expected
(gdb) set environment LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm"
(gdb) set args -c testfile19.index
(gdb) break main
Breakpoint 1 at 0x103140: file readelf.c, line 249.
(gdb) run
Starting program: /builddir/build/BUILD/elfutils-0.155/src/readelf -c testfile19.index
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[tcsetpgrp failed in terminal_inferior: Inappropriate ioctl for device]
thread_get_info_callback: cannot get thread info: generic error
Missing separate debuginfos, use: debuginfo-install elfutils-libs-0.152-1.el6.sparc64 glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64
(gdb) where
Target is executing.
(gdb) thread 1
[Switching to thread 1 (Thread 0xfffff8010002ab90 (LWP 12146))](running)
(gdb) where
Target is executing.

.... give up on gdb then 8/

Comment 8 Bryce 2013-01-03 20:07:12 UTC
Chasing using the old-school printf("XYZ\n"); method

static void
dump_archive_index (Elf *elf, const char *fname)
{
printf("2a\n");
  size_t narsym;
  const Elf_Arsym *arsym = elf_getarsym (elf, &narsym);
printf("2b\n");

(2b doesn't show,..)

so,.. where is elf_getarsym() from?.. libelf/elf_getarsym.c by the look of it.

In a way this might be a good thing as it would be shared with the other tests that failed. Assuming they all have the same issue.

------------------------------------------------------
libelf/elf_getarsym.c

3g
3m
bash-4.2#

    167       int w = index64_p ? 8 : 4;
    168 printf("3g\n");
    169 
    170       /* We have an archive.  The first word in there is the number of
    171          entries in the table.  */
    172       uint64_t n;
    173       size_t off = elf->start_offset + SARMAG + sizeof (struct ar_hdr);
    174 printf("3m\n");
    175 
    176       if (read_number_entries (&n, elf, &off, index64_p) < 0)
    177         {
    178           /* Cannot read the number of entries.  */
    179           __libelf_seterrno (ELF_E_NO_INDEX);
    180 printf("3q\n");
    181           goto out;
    182         }
    183 printf("3n\n");


3q or n  doesn't show up,.. follow into  read_number_entries()
------------------------------------------------------
     49 static int
     50 read_number_entries (uint64_t *nump, Elf *elf, size_t *offp, bool index64_p)
     51 {
     52   union u
     53   {
     54     uint64_t ret64;
     55     uint32_t ret32;
     56   } u;

Oh,.. union 8/ I remember all kinds of warnings about unions on big endian machines... Ok lets get the show started, more printf()'s,..

4a
4b
4c
bash-4.2# 

printf("4a\n");
  size_t w = index64_p ? 8 : 4;
printf("4b\n");
  if (elf->map_address != NULL) {
printf("4c\n");
    u = *(union u *) (elf->map_address + *offp);
printf("4d\n");


Ok we arrive here,.. and this is apparently where bad things happen.. (eyes glazing over)

Ummm,....

Comment 9 Roland McGrath 2013-01-03 20:36:01 UTC
Created attachment 672206 [details]
elf_getarsym patch for unaligned access

Please try the attached patch and see if it helps.

Comment 10 Bryce 2013-01-03 20:48:34 UTC
printf("1a\n");
  size_t w = index64_p ? 8 : 4;
printf("1b\n");
  if (elf->map_address != NULL) {
printf("1c\n");
/*    u = *(union u *) (elf->map_address + *offp); */
memcpy (&u, elf->map_address + *offp, sizeof u);
printf("1d\n");


1a
1b
1c
1d

Ok lets remove the printf()s and run the test again

------------------------------------------
bash-4.2# (cd /builddir/build/BUILD/elfutils-0.155 ; make check TESTS="run-readelf-test4.sh")
make[2]: Entering directory `/builddir/build/BUILD/elfutils-0.155/tests'
PASS: run-readelf-test4.sh
=============
1 test passed
=============

Hazzah!.. ok lets run the full suite and see if the other two shared the same fault.

hellfire,.. no they're different 8/ ... -sigh- more printf debugging ... it's gonna be a long day

Still one down isn't a bad thing 8) thanks.

I'll keep digging here

Phil
=--=

Comment 11 Bryce 2013-01-03 22:07:40 UTC
Next up,.. lets follow through on the run-test-archive64.sh test

LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" ../src/readelf -c testarchive64.a

@@ -1,2 +1,11 @@
-Running LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm" ../src/readelf -c testarchive64.a
-./test-subr.sh: line 73: 53096 Bus error               (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" "$@"
+
+Index of archive 'testarchive64.a' has 7 entries:
+Archive member 'aaa.o' contains:
+       aaa
+Archive member 'bbb.o' contains:
+       bbb
+       bbb2
+Archive member 'ccc.o' contains:
+       ccc
+       ccc2
+       ccc3

-----------------------------------------------------
unpack the testfile
bash-4.2# cd /builddir/build/BUILD/elfutils-0.155/tests
bash-4.2# bzip2 -cd testarchive64.a.bz2 > testarchive64.a

check a run with strace for anything useful
bash-4.2# export LD_LIBRARY_PATH="../libdw:../backends:../libelf:../libasm"
bash-4.2# strace -f ../src/readelf -c testarchive64.a

open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 104822576, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff801006f4000
close(3)                                = 0
open("testarchive64.a", O_RDONLY)       = 3
fcntl(3, F_GETFL)                       = 0x40000 (flags O_RDONLY|O_LARGEFILE)
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 4360, PROT_READ, MAP_PRIVATE, 3, 0) = 0xfffff8010025c000
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010025c04c} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)

Ok,.. time to go hunting wabbits. I note that this is still "../src/readelf -c" being run same as above for run-readelf-test4.sh. Lets see where it leads.

actually the trace is pretty much identical except further along.

4k
4k2
4k2a
bash-4.2# 

    209           if (elf->map_address == NULL)
    210             {
    211               file_data = alloca (sz);
    212 
    213               ar_sym_len += index_size - n * w;
    214               Elf_Arsym *newp = (Elf_Arsym *) realloc (elf->state.ar.ar_sym,
    215                                                        ar_sym_len);
    216               if (newp == NULL)
    217                 {
    218                   free (elf->state.ar.ar_sym);
    219                   elf->state.ar.ar_sym = NULL;
    220                   __libelf_seterrno (ELF_E_NOMEM);
    221                   goto out;
    222                 }
    223               elf->state.ar.ar_sym = newp;
    224 
    225               char *new_str = (char *) (elf->state.ar.ar_sym + n + 1);
    226 
    227               /* Now read the data from the file.  */
    228               if ((size_t) pread_retry (elf->fildes, file_data, sz, off) != sz
    229                   || ((size_t) pread_retry (elf->fildes, new_str,
    230                                             index_size - sz, off + sz)
    231                       != index_size - sz))
    232                 {
    233                   /* We were not able to read the data.  */
    234                   free (elf->state.ar.ar_sym);
    235                   elf->state.ar.ar_sym = NULL;
    236                   __libelf_seterrno (ELF_E_NO_INDEX);
    237                   goto out;
    238                 }
    239 
    240               str_data = (char *) new_str;
    241             }
    242           else
    243             {
--> 244 printf("4k\n");
    245               file_data = (void *) (elf->map_address + off);
    246               str_data = (char *) (elf->map_address + off + sz);
    247             }
    248 
    249           /* Now we can build the data structure.  */
    250           Elf_Arsym *arsym = elf->state.ar.ar_sym;
    251           for (size_t cnt = 0; cnt < n; ++cnt)
    252             {
    253               arsym[cnt].as_name = str_data;
    254 printf("4k2\n");              if (index64_p)
    255                 {
    256 printf ("4k2a - cnt = %d \n", (int)cnt);
**> 257                   uint64_t tmp = file_data->u64[cnt];
    258 printf ("4k2b\n");
    259                   if (__BYTE_ORDER == __LITTLE_ENDIAN)
    260                     tmp = bswap_64 (tmp);
    261 printf ("4k2c\n");
    262 
    263                   arsym[cnt].as_off = tmp;


*bang*

Comment 12 Bryce 2013-01-03 22:09:43 UTC
Sorry,. I should have indicated more clearly that this is in
libelf/elf_getarsym.c

Comment 13 Bryce 2013-01-04 02:12:55 UTC
Hurm,.. I'm looking at the run-unstrip-n.sh test

it was able to chew though the testcore-rtlib file fine but abended while processing the testcore-rtlib-ppc corefile

-----------------------------------------------------
bash-4.2# export LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm
bash-4.2# ../src/unstrip -n --core=testcore-rtlib
0x8048000+0x2000 f1c600bc36cb91bf01f9a63a634ecb79aa4c3199@0x8048178 . - [exe]
0xf77d6000+0x1000 676560b1b765cde9c2e53f134f4ee354ea894747@0xf77d6210 . - linux-gate.so.1
0xf77b3000+0x9000 c6c5b5e35ab9589d4762ac85b4bd56b1b2720e37@0xf77b3164 /lib/librt.so.1 - librt.so.1
0xf7603000+0x1b0000 0b9bf374699e141e5dfc14757ff42b8c2373b4de@0xf7603184 /lib/libc.so.6 - libc.so.6
0xf75e9000+0x1a000 29a103420abe341e92072fb14274e250e4072148@0xf75e9164 /lib/libpthread.so.0 - libpthread.so.0
0xf77d7000+0x21000 6d2cb32650054f1c176d01d48713a4a5e5e84c1a@0xf77d7124 /lib/ld-linux.so.2 - ld-linux.so.2

-----------------------------------------------------
bash-4.2# ../src/unstrip -n --core=testcore-rtlib-ppc
Bus error (core dumped)

-----------------------------------------------------

Tracking,..
open("testcore-rtlib-ppc", O_RDONLY)    = 3
fcntl(3, F_GETFL)                       = 0x40000 (flags O_RDONLY|O_LARGEFILE)
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 1376256, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xfffff80106aec000
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff80106aec41c} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)

Oh joy,..
meh I guess I should really use __line__, __file__, __func__ this time

[mockbuild@localhost elfutils-0.155]$ (cd /builddir/build/BUILD/elfutils-0.155; make ; bzip2 -cd tests/testcore-rtlib-ppc.bz2 >  tests/testcore-rtlib-ppc ; (cd tests; LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm ../src/unstrip -n --core=testcore-rtlib-ppc))
Line 2222
Line 2224
Line 2241
Line 2251
Line 2285
[mockbuild@localhost elfutils-0.155]$ 

   2285 printf("Line %d\n", __LINE__);
   2286   int remaining;
   2287   struct arg_info info = { .args = NULL };
   2288   error_t result = argp_parse (&argp, argc, argv, 0, &remaining, &info);
   2289 printf("Line %d\n", __LINE__);


Line 2251
Line 2285
Line 2288

The following assumes gdb isn't lying...

(gdb) set environment LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm
(gdb) set args  -n --core=testcore-rtlib-ppc
(gdb) break 2288
Breakpoint 1 at 0x1020d0: file unstrip.c, line 2288.
(gdb) run
Breakpoint 1, main (argc=<optimized out>, argv=0x7fefffff6a8) at unstrip.c:2289
2289      error_t result = argp_parse (&argp, argc, argv, 0, &remaining, &info);
(gdb) set print pretty on
(gdb) p argp
$4 = {
  options = 0x108a70 <options>, 
  parser = 0x1034a0 <parse_opt>, 
  args_doc = 0x1081d8 "STRIPPED-FILE DEBUG-FILE\n[MODULE...]", 
  doc = 0x108200 "Combine stripped files with separate symbols and debug information.\vThe first form puts the result in DEBUG-FILE if -o was not given.\n\nMODULE arguments give file name patterns matching modules to proc"..., 
  children = 0x7fefffff2e0, 
  help_filter = 0x0, 
  argp_domain = 0x0
}
(gdb) print info
$8 = {
  output_file = 0x0, 
  output_dir = 0x0, 
  dwfl = 0x0, 
  args = 0x0, 
  list = false, 
  all = false, 
  ignore = false, 
  modnames = false, 
  match_files = false, 
  relocate = false
}
(gdb) step
Program received signal SIGBUS, Bus error.
auxv_format_probe (elfdata=<optimized out>, elfclass=<optimized out>, size=<optimized out>, auxv=<optimized out>) at link_map.c:107
107           if (check64 (i))
(gdb) where
#0  auxv_format_probe (elfdata=<optimized out>, elfclass=<optimized out>, size=<optimized out>, auxv=<optimized out>) at link_map.c:107
#1  dwfl_link_map_report (auxv=0xfffff8010081041c, auxv_size=200, memory_callback=memory_callback@entry=0xfffff80100269a20 <dwfl_elf_phdr_memory_callback>, memory_callback_arg=memory_callback_arg@entry=0x20c590) at link_map.c:614
#2  0xfffff8010026a244 in dwfl_core_file_report (dwfl=dwfl@entry=0x20c520, elf=0x20c590) at core-file.c:454
#3  0xfffff80100261b18 in parse_opt (key=<optimized out>, arg=0x7fefffff8cb "testcore-rtlib-ppc") at argp-std.c:207
#4  0xfffff801006965a0 in argp_parse () from /lib64/libc.so.6
#5  0x00000000001020ec in main (argc=<optimized out>, argv=0x7fefffff6a8) at unstrip.c:2289
(gdb) list
102         return false;
103       }
104
105       for (size_t i = 0; i < size / sizeof (Elf64_auxv_t); ++i)
106         {
107           if (check64 (i))
108             {
109               *elfclass = ELFCLASS64;
110               return true;
111             }
(gdb) print i
$9 = 0


[mockbuild@localhost elfutils-0.155]$ vi libdwfl/link_map.c +107

     60   const union
     61   {
     62     char buf[size];
     63     Elf32_auxv_t a32[size / sizeof (Elf32_auxv_t)];
     64     Elf64_auxv_t a64[size / sizeof (Elf64_auxv_t)];
     65   } *u = auxv;
     66 
     67   inline bool check64 (size_t i)
     68   {
     69     if (u->a64[i].a_type == BE64 (PROBE_TYPE)
     70         && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64))
     71       {
     72         *elfdata = ELFDATA2MSB;
     73         return true;
     74       }
     75 
     76     if (u->a64[i].a_type == LE64 (PROBE_TYPE)
     77         && u->a64[i].a_un.a_val == LE64 (PROBE_VAL64))
     78       {
     79         *elfdata = ELFDATA2LSB;
     80         return true;
     81       }
     82 
     83     return false;
     84   }

Oh aye,.. here we go with unions again 8/

Comment 14 Bryce 2013-01-04 05:52:21 UTC
Ok having rebuilt the libdwfl as -O0 -g3 without FORTIFY_SOURCE and -fno-inline,.... -grumble-


Program received signal SIGBUS, Bus error.
0xfffff801002736b0 in check64 (i=0) at link_map.c:70
70          if (u->a64[i].a_type == BE64 (PROBE_TYPE)
Missing separate debuginfos, use: debuginfo-install glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64
(gdb) set print pretty on
(gdb) list
65        } *u = auxv;
66
67        /* inline bool check64 (size_t i) */
68        bool check64 (size_t i)
69        {
70          if (u->a64[i].a_type == BE64 (PROBE_TYPE)
71              && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64))
72            {
73              *elfdata = ELFDATA2MSB;
74              return true;
(gdb) info locals
u = 0xfffff8010082041c
elfdata = 0x7feffffea57 ""
(gdb) print u->a64[0]
$2 = {
  a_type = 94489280534, 
  a_un = {
    a_val = 94489280534
  }
}
(gdb) info macro BE64
Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:43
#define BE64(x) (x)
(gdb) info macro PROBE_TYPE
Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:37
#define PROBE_TYPE AT_PHENT
(gdb) info macro AT_PHENT
Defined at ./../libelf/elf.h:953
  included at ./../libelf/libelf.h:35
  included at ./../libelf/gelf.h:32
  included at ./../libdw/libdw.h:32
  included at ./libdwfl.h:32
  included at /builddir/build/BUILD/elfutils-0.155/libdwfl/libdwflP.h:35
  included at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:30
#define AT_PHENT 4

Hurm guessing that thats not really a decimal value,.. switch to hex

(gdb) print /x  u->a64[i].a_type   
$4 = 0x1600000016
(gdb) print /x u->a64[i].a_un.a_val
$5 = 0x1600000016

(gdb) info macro PROBE_VAL64
Defined at /builddir/build/BUILD/elfutils-0.155/libdwfl/link_map.c:39
#define PROBE_VAL64 sizeof (Elf64_Phdr)

erk,. ok sooo whats the size of that?
(gdb) print sizeof(Elf64_Phdr)
$6 = 56

soooo that one line should demangle to

if (0x1600000016 == 4 && 0x1600000016 == 56)
= if ( 0 && 0 )

I don't get it,.. why does that produce a bus error?

Program received signal SIGBUS, Bus error.
0xfffff801002736b0 in check64 (i=0) at link_map.c:70
70          if (u->a64[i].a_type == BE64 (PROBE_TYPE)
(gdb) print /x u->a64[i].a_type
$7 = 0x1600000016
(gdb) print /x BE64 (PROBE_TYPE)
$8 = 0x4


(gdb) print /x u->a64[i].a_un.a_val
$9 = 0x1600000016
(gdb) print BE64 (PROBE_VAL64) 
$10 = 56


ideas? anyone?

Comment 15 Bryce 2013-01-04 06:12:56 UTC
Condensed the above down to

libdwfl/link_map.c:70 

Program received signal SIGBUS, Bus error.
0xfffff801002736b0 in check64 (i=0) at link_map.c:70
70          if (u->a64[i].a_type == BE64 (PROBE_TYPE)
Missing separate debuginfos, use: debuginfo-install glibc-2.16-28.fc18.sparc64 libgcc-4.7.2-8.fc18.sparc64

(gdb) print /x (u->a64[i].a_type == BE64 (PROBE_TYPE) && u->a64[i].a_un.a_val == BE64 (PROBE_VAL64))
$1 = 0x0
(gdb) step

Program terminated with signal SIGBUS, Bus error.
The program no longer exists.

.... I'm  bemused/confused/terribly terribly frustrated and this creates a bus error HOW?

Comment 16 Bryce 2013-01-04 07:42:06 UTC
Oh, as a final thought, here is the disassembly/registers
[mockbuild@localhost tests]$ ulimit -c unlimited ; LD_LIBRARY_PATH=../libdw:../backends:../libelf:../libasm strace -f ../src/unstrip -n --core=testcore-rtlib-ppc
...
mmap(NULL, 1376256, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0xfffff80100704000
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0xfffff8010070441c} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)

[mockbuild@localhost tests]$ gdb ../src/unstrip core.37898
Core was generated by `../src/unstrip -n --core=testcore-rtlib-ppc'.
Program terminated with signal 10, Bus error.
#0  0xfffff801001536b0 in check64 (i=0) at link_map.c:69
69          if (u->a64[i].a_type == BE64 (PROBE_TYPE)
(gdb) disassemble check64
Dump of assembler code for function check64:
   0xfffff80100153698 <+0>:     save  %sp, -176, %sp
   0xfffff8010015369c <+4>:     stx  %i0, [ %fp + 0x87f ]
   0xfffff801001536a0 <+8>:     mov  %g5, %i5
   0xfffff801001536a4 <+12>:    ldx  [ %i5 + 8 ], %g2
   0xfffff801001536a8 <+16>:    ldx  [ %fp + 0x87f ], %g1
   0xfffff801001536ac <+20>:    sllx  %g1, 4, %g1
=> 0xfffff801001536b0 <+24>:    ldx  [ %g2 + %g1 ], %g1
   0xfffff801001536b4 <+28>:    cmp  %g1, 4
   0xfffff801001536b8 <+32>:    bne  %xcc, 0xfffff801001536f8 <check64+96>
   0xfffff801001536bc <+36>:    nop 
   0xfffff801001536c0 <+40>:    ldx  [ %i5 + 8 ], %g2
   0xfffff801001536c4 <+44>:    ldx  [ %fp + 0x87f ], %g1
   0xfffff801001536c8 <+48>:    sllx  %g1, 4, %g1
   0xfffff801001536cc <+52>:    add  %g2, %g1, %g1
   0xfffff801001536d0 <+56>:    ldx  [ %g1 + 8 ], %g1
   0xfffff801001536d4 <+60>:    cmp  %g1, 0x38
   0xfffff801001536d8 <+64>:    bne  %xcc, 0xfffff801001536f8 <check64+96>
   0xfffff801001536dc <+68>:    nop 
   0xfffff801001536e0 <+72>:    ldx  [ %i5 ], %g1
   0xfffff801001536e4 <+76>:    mov  2, %g2
   0xfffff801001536e8 <+80>:    stb  %g2, [ %g1 ]
   0xfffff801001536ec <+84>:    mov  1, %g1
   0xfffff801001536f0 <+88>:    b  %xcc, 0xfffff80100153770 <check64+216>
   0xfffff801001536f4 <+92>:    nop 
   0xfffff801001536f8 <+96>:    ldx  [ %i5 + 8 ], %g2
   0xfffff801001536fc <+100>:   ldx  [ %fp + 0x87f ], %g1
   0xfffff80100153700 <+104>:   sllx  %g1, 4, %g1
   0xfffff80100153704 <+108>:   ldx  [ %g2 + %g1 ], %i4
   0xfffff80100153708 <+112>:   mov  4, %o0
   0xfffff8010015370c <+116>:   call  0xfffff8010015364c <__bswap_64>
   0xfffff80100153710 <+120>:   nop 
   0xfffff80100153714 <+124>:   mov  %o0, %g1
   0xfffff80100153718 <+128>:   cmp  %i4, %g1
   0xfffff8010015371c <+132>:   bne  %xcc, 0xfffff8010015376c <check64+212>
   0xfffff80100153720 <+136>:   nop 
   0xfffff80100153724 <+140>:   ldx  [ %i5 + 8 ], %g2
   0xfffff80100153728 <+144>:   ldx  [ %fp + 0x87f ], %g1
   0xfffff8010015372c <+148>:   sllx  %g1, 4, %g1
   0xfffff80100153730 <+152>:   add  %g2, %g1, %g1
   0xfffff80100153734 <+156>:   ldx  [ %g1 + 8 ], %i4
   0xfffff80100153738 <+160>:   mov  0x38, %o0
   0xfffff8010015373c <+164>:   call  0xfffff8010015364c <__bswap_64>
   0xfffff80100153740 <+168>:   nop 
   0xfffff80100153744 <+172>:   mov  %o0, %g1
   0xfffff80100153748 <+176>:   cmp  %i4, %g1
   0xfffff8010015374c <+180>:   bne  %xcc, 0xfffff8010015376c <check64+212>
   0xfffff80100153750 <+184>:   nop 
   0xfffff80100153754 <+188>:   ldx  [ %i5 ], %g1
   0xfffff80100153758 <+192>:   mov  1, %g2
   0xfffff8010015375c <+196>:   stb  %g2, [ %g1 ]
   0xfffff80100153760 <+200>:   mov  1, %g1
   0xfffff80100153764 <+204>:   b  %xcc, 0xfffff80100153770 <check64+216>
   0xfffff80100153768 <+208>:   nop 
   0xfffff8010015376c <+212>:   clr  %g1        ! 0x0
   0xfffff80100153770 <+216>:   and  %g1, 0xff, %g1
   0xfffff80100153774 <+220>:   mov  %g1, %i0
   0xfffff80100153778 <+224>:   rett  %i7 + 8
   0xfffff8010015377c <+228>:   nop 
End of assembler dump.
(gdb) info registers
g0             0x0      0
g1             0x0      0
g2             0xfffff8010070441c       -8791790697444
g3             0xc      12
g4             0xc8     200
g5             0x7feffda89b0    8791795599792
g6             0x454c465554494c53       4993443419147291731
g7             0xfffff801000030d0       -8791798042416
o0             0x0      0
o1             0x0      0
o2             0x0      0
o3             0x0      0
o4             0x0      0
o5             0x0      0
sp             0x7feffda8031    0x7feffda8031
o7             0x0      0
l0             0x0      0
l1             0x0      0
l2             0x0      0
l3             0x0      0
l4             0x0      0
l5             0x0      0
l6             0x0      0
l7             0x0      0
i0             0x0      0
i1             0x0      0
i2             0x0      0
i3             0x0      0
i4             0x0      0
i5             0x7feffda89b0    8791795599792
fp             0x7feffda80e1    0x7feffda80e1
i7             0xfffff80100153824       -8791796664284
pc             0xfffff801001536b0       0xfffff801001536b0 <check64+24>
npc            0xfffff801001536b4       0xfffff801001536b4 <check64+28>
state          0xf0001206       4026536454
fsr            0x0      [ ]
fprs           *value not available*
y              0x0      0
cwp            0x6      6
pstate         0x12     [ IE PEF ]
asi            0xf0     240
ccr            0x0      0

Comment 17 Petr Machata 2013-01-04 15:21:26 UTC
Created attachment 672431 [details]
fix for unaligned accesses

I got this reproduced on Itanium, which is sensitive to unaligned access as well (except whether it sends SIGBUS is configurable with prctl).

Comment 18 Bryce 2013-01-04 16:46:50 UTC
[mockbuild@localhost elfutils-0.155]$ make check TESTS="run-unstrip-n.sh"
PASS: run-unstrip-n.sh
=============
1 test passed
=============


And with that,...
[mockbuild@localhost elfutils-0.155]$ make check
===================
All 88 tests passed
===================

Yea! finally we can all go back to sleep again. Thank you very much for you time and help

Phil
=--=

Comment 19 Roland McGrath 2013-01-07 23:17:48 UTC
I've fixed all the unaligned access issues on the trunk.
I tested it with a build done with -m64 on a sparc64 machine.

We probably don't need to backport these fixes to Fedora packages, but can just make another release before too long instead.  Mark can decide whether to just close this bug as NEXTRELEASE or to do the backports in the Fedora package.

On the machine I have access to, there are some elflint-self failures that look like binutils bugs (all the same one bug).  But that machine is not Fedora so I don't know if Fedora binutils produces any such problems.

Comment 20 Mark Wielaard 2013-01-08 09:03:00 UTC
(In reply to comment #19)
> I've fixed all the unaligned access issues on the trunk.
> I tested it with a build done with -m64 on a sparc64 machine.

Thanks.
BTW. Are sparc64 or ia64 the only arches on which these issues show up?

> We probably don't need to backport these fixes to Fedora packages, but can
> just make another release before too long instead.  Mark can decide whether
> to just close this bug as NEXTRELEASE or to do the backports in the Fedora
> package.

If the reporter wants I can add the patches to the rawhide package. Please just yell.
 
> On the machine I have access to, there are some elflint-self failures that
> look like binutils bugs (all the same one bug).  But that machine is not
> Fedora so I don't know if Fedora binutils produces any such problems.

I am afraid this is upstream ld binutils:
http://sourceware.org/bugzilla/show_bug.cgi?id=13621
Exposed when using gcc 4.7+ I believe.
There used to be a fix for it by rth. Which was also backported to fedora.
But it turned out that caused a different issue and so was then reverted again.

The issue is that GNU ld (but not gold) removes "empty sections" even though there might be symbols for that section. rth's fix just marked such symbols as absolute (which apparently confused the linux kernel). I tried to create a patch keep track of symbols associated with sections so they wouldn't get discarded even when they were empty. But BFD makes that a lot of work and I haven't tracked down everywhere that this info needs to be passed around yet.

We might have to add some extra hack to elflint --gnu-ld to allow "dangling symbols". But I couldn't think of a good way to detect them, since the symbol section that GNU ld assigns to these symbols is basically random (I believe it is just the old section number, but all sections are renumbered).

Comment 21 Bryce 2013-01-08 10:38:28 UTC
Would be nice, thanks. (rawhide)
I've been running the patches out of tree for the past few days and haven't seen anything terribly wrong with elfutils/libs as yet. (Though there are loads of other fun issues I'm trying to beat down in other packages which rely on elfutils).

Thanks.

Phil
=--=

Comment 22 Roland McGrath 2013-01-08 18:45:53 UTC
(In reply to comment #20)
> > On the machine I have access to, there are some elflint-self failures that
> > look like binutils bugs (all the same one bug).  But that machine is not
> > Fedora so I don't know if Fedora binutils produces any such problems.
> 
> I am afraid this is upstream ld binutils:
> http://sourceware.org/bugzilla/show_bug.cgi?id=13621
> Exposed when using gcc 4.7+ I believe.
> There used to be a fix for it by rth. Which was also backported to fedora.
> But it turned out that caused a different issue and so was then reverted
> again.
> 
> The issue is that GNU ld (but not gold) removes "empty sections" even though
[...]

That is not the issue I saw on davem's sparc64 machine.  What I saw was
bogus .gnu.attributes sections, which is unrelated.

Comment 23 Mark Wielaard 2013-01-10 10:37:00 UTC
(In reply to comment #21)
> Would be nice, thanks. (rawhide)
> I've been running the patches out of tree for the past few days and haven't
> seen anything terribly wrong with elfutils/libs as yet. (Though there are
> loads of other fun issues I'm trying to beat down in other packages which
> rely on elfutils).

elfutils-0.155-2.fc19 has the patches.
Don't know if/when the sparc koji picks it up.

Comment 24 Bryce 2013-01-10 11:41:27 UTC
Not for a while (I'm trying to get the fc18 buildroot stable to hand over to spot/dgilmore but ,.. yi yi yi,.. so many issues!)

Phil  (attempting to get the fedora core secondary archs - sparc  stuff working)
=--=


Note You need to log in before you can comment on or make changes to this bug.