This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 1052415 - resolving addresses to symbols using MiniDebugInfo fails on ppc64
resolving addresses to symbols using MiniDebugInfo fails on ppc64
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rpm (Show other bugs)
7.0
ppc64 Unspecified
high Severity high
: rc
: ---
Assigned To: Panu Matilainen
Patrik Kis
:
Depends On:
Blocks: 1012790 1056145
  Show dependency treegraph
 
Reported: 2014-01-13 14:23 EST by Martin Milata
Modified: 2014-06-17 22:15 EDT (History)
9 users (show)

See Also:
Fixed In Version: rpm-4.11.1-11.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 07:03:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Symbol tables from both minidebuginfo and separate debuginfo (30.90 KB, text/plain)
2014-01-14 06:45 EST, Martin Milata
no flags Details

  None (edit)
Description Martin Milata 2014-01-13 14:23:26 EST
Description of problem:
Resolving code address to symbol, e.g. when running eu-stack, seems to fail when MiniDebugInfo is used on PPC64. I always got 0000003b.plt_call.SOMETHING@@GLIBC_2.3 as the symbol name instead.

Version-Release number of selected component (if applicable):
elfutils-0.158-1.el7.ppc64

How reproducible: Always.

Steps to Reproduce:
ppc$ ulimit -c unlimited
ppc$ sleep 1d &
ppc$ kill -11 $!
ppc$ eu-stack -e `which sleep` --core core.*

Actual results:
PID 22303 - core
TID 22303:
#0  0x00003fff88098808 __nanosleep
#1  0x0000000010004f30 0000003b.plt_call.free@@GLIBC_2.3
#2  0x0000000010004ca4 0000003b.plt_call.free@@GLIBC_2.3
#3  0x00000000100017e4 0000003b.plt_call.free@@GLIBC_2.3
#4  0x00003fff87ff44cc generic_start_main.isra.0
eu-stack: dwfl_thread_getframes tid 22303 at 0x3fff87ff44cb in libc.so.6: No DWARF information found

Expected results:
Something similar to what I get on x86_64:
x86$ ulimit -c unlimited
x86$ sleep 1d &
x86$ kill -11 $!
x86$ eu-stack -e `which sleep` --core core.*
PID 21377 - core
TID 21377:
#0  0x00007fb51e3f0720 __nanosleep
#1  0x0000000000403d37 rpl_nanosleep
#2  0x0000000000403bf0 xnanosleep
#3  0x00000000004016bd main
#4  0x00007fb51e354af5 __libc_start_main
#5  0x00000000004017c9 _start

Additional info:
I'm aware that the dwfl_thread_getframes error is a separate issue which would be difficult to fix for rhel7.
Comment 2 Mark Wielaard 2014-01-13 15:57:23 EST
(In reply to Martin Milata from comment #0)
> Description of problem:
> Resolving code address to symbol, e.g. when running eu-stack, seems to fail
> when MiniDebugInfo is used on PPC64. I always got
> 0000003b.plt_call.SOMETHING@@GLIBC_2.3 as the symbol name instead.

Hmmmm. Did you try installing the debuginfo for the program to make sure it was the minisymtab? What does the backtrace look like if you add the debuginfo packages with debuginfo-install coreutils?

If that does work correctly it might be that the minisymtab table is not correctly setup on ppc64. Could you show how it looks with eu-readelf --elf-section --symbols /usr/bin/sleep?

Could you compare it, or add, the full symbol table from the debug file with
eu-readelf --symbols /usr/lib/debug/usr/bin/sleep.debug?
Comment 3 Martin Milata 2014-01-14 06:43:51 EST
(In reply to Mark Wielaard from comment #2)
> Hmmmm. Did you try installing the debuginfo for the program to make sure it
> was the minisymtab? What does the backtrace look like if you add the
> debuginfo packages with debuginfo-install coreutils?

With debuginfo, the output seems to be alright:

ppc$ yum install coreutils-debuginfo
ppc$ eu-stack -e `which sleep` --core core.*
PID 31401 - core
TID 31401:
#0  0x00003fffa7a68808 __nanosleep
#1  0x0000000010004f30 rpl_nanosleep
#2  0x0000000010004ca4 xnanosleep
#3  0x00000000100017e4 main
#4  0x00003fffa79c44cc generic_start_main.isra.0
eu-stack: dwfl_thread_getframes tid 31401 at 0x3fffa79c44cb in libc.so.6: No DWARF information found

> If that does work correctly it might be that the minisymtab table is not
> correctly setup on ppc64. Could you show how it looks with eu-readelf
> --elf-section --symbols /usr/bin/sleep?
> 
> Could you compare it, or add, the full symbol table from the debug file with
> eu-readelf --symbols /usr/lib/debug/usr/bin/sleep.debug?

Please see the attachment for output of these commands.
Comment 4 Martin Milata 2014-01-14 06:45:00 EST
Created attachment 849911 [details]
Symbol tables from both minidebuginfo and separate debuginfo
Comment 5 Mark Wielaard 2014-01-14 07:15:47 EST
Thanks that is very interesting. So the issue is that the .gnu_debugdata section embedded in the executable doesn't actually contain any function symbols at all.

For example you would expect the xnanosleep one to be there. And it does contain non-STT_FUNC symbols like these 00000000.plt_call. ones. This suggests that the .gnu_debugdata was wrongly generated.

What we could do to make output a little less confusing is refuse to show names of non-STT_FUNC symbols that we do find and that we conclude is "closest" to the address. But that doesn't really solve your issue, since then we just print the address without any name.

So lets reassign this to rpm first so they can figure out what is going wrong with the .gnu_debugdata minisymtab generation on ppc64 in the first place.
Comment 6 Mark Wielaard 2014-01-14 10:32:56 EST
I looked on an actual ppc64 install and found the issue.
rpm's find-debuginfo.sh does the following to find the function symbols:

add_minidebug()
{
  local debuginfo="$1"
  local binary="$2"

  local dynsyms=`mktemp`
  local funcsyms=`mktemp`
  local keep_symbols=`mktemp`
  local mini_debuginfo=`mktemp`

  # Extract the dynamic symbols from the main binary, there is no need to also have these
  # in the normal symbol table
  nm -D "$binary" --format=posix --defined-only | awk '{ print $1 }' | sort > "$dynsyms"
  # Extract all the text (i.e. function) symbols from the debuginfo 
  nm "$debuginfo" --format=posix --defined-only | awk '{ if ($2 == "T" || $2 == "t") print $1 }' | sort > "$funcsyms"
  # Keep all the function symbols not already in the dynamic symbol table
  comm -13 "$dynsyms" "$funcsyms" > "$keep_symbols"
  # Copy the full debuginfo, keeping only a minumal set of symbols and removing some unnecessary sections
  objcopy -S --remove-section .gdb_index --remove-section .comment --keep-symbols="$keep_symbols" "$debuginfo" "$mini_debuginfo" &> /dev/null
  #Inject the compressed data into the .gnu_debugdata section of the original binary
  xz "$mini_debuginfo"
  mini_debuginfo="${mini_debuginfo}.xz"
  objcopy --add-section .gnu_debugdata="$mini_debuginfo" "$binary"
  rm -f "$dynsyms" "$funcsyms" "$keep_symbols" "$mini_debuginfo"
}

Where things seem to go wrong is at this step:
  # Extract all the text (i.e. function) symbols from the debuginfo 
  nm "$debuginfo" --format=posix --defined-only | awk '{ if ($2 == "T" || $2 == "t") print $1 }' | sort > "$funcsyms"

If we take sleep as example then it will not find any real function symbols because:

nm /usr/lib/debug/usr/bin/sleep.debug --format=posix --defined-only | grep xnanosleep
xnanosleep B 00000000100204b0 00000000000000cc

Note that it says 'B' ("The symbol is in the uninitialized data section (known as BSS)."), not 'T' ("The symbol is in the text (code) section.").

This is because binutils nm decides it does see a STT_FUNC symbol, but that the value points into the .odp section, which is has type NOBITS in the .debug file. I guess that is one valid interpretation but not a very useful one in this case.

One workaround would be to use elfutils eu-nm instead which does say the type is "T":

eu-nm /usr/lib/debug/usr/bin/sleep.debug --format=posix --defined-only | grep xnanosleep
xnanosleep T 00000000100204b0 00000000000000cc

Another would be to file against binutils nm (assuming the eu-nm interpretation of the symbol type is the correct one.
Comment 7 Jan Kratochvil 2014-01-14 10:45:42 EST
GDB needed a ppc64 fix which apparently has not made it into find-debuginfo.sh:

https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=e58fcc15916bdb8be2b318a685430a597824a606

Just the fix uses ELF symbol type "D" while the ELF symbol type is "B" in your example above.
Comment 8 Mark Wielaard 2014-01-14 10:47:48 EST
O, another solution might be to use --format-sysv which does print the symbol type explicitly (without any interpretation):

nm /usr/lib/debug/usr/bin/sleep.debug --format=sysv --defined-only | grep xnanosleep
xnanosleep          |00000000100204b0|   B  |              FUNC|00000000000000cc|     |.opd

Making the add_minidebug script line match field 4 against FUNC:
nm "$debuginfo" --format=sysv --defined-only | awk -F \| '{ if ($4 ~ "FUNC") print $1 }'
Comment 9 Mark Wielaard 2014-01-14 10:54:46 EST
(In reply to Jan Kratochvil from comment #7)
> GDB needed a ppc64 fix which apparently has not made it into
> find-debuginfo.sh:
> 
> https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;
> h=e58fcc15916bdb8be2b318a685430a597824a606
> 
> Just the fix uses ELF symbol type "D" while the ELF symbol type is "B" in
> your example above.

The reason for the difference "D" ("The symbol is in the initialized data section.") and "B" ("The symbol is in the uninitialized data section (known as BSS).") might be because the gbd testsuite is matching against the main elf file (where .odp is initialized data) and the rpm script is matching against the debug file (where .odp is NOBITS).

I think using --format=sysv style might be preferable to match the ELF symbol type directly instead of using the somewhat ambiguous symbol type char used in --format=posix style in binutils nm.
Comment 10 Jan Kratochvil 2014-01-14 11:02:06 EST
Just post a patch, according to the past experience rpm maintainers left /usr/lib/rpm/find-debuginfo.sh maintenance to the Tools people.
Comment 11 Mark Wielaard 2014-01-14 12:25:22 EST
(In reply to Jan Kratochvil from comment #10)
> Just post a patch, according to the past experience rpm maintainers left
> /usr/lib/rpm/find-debuginfo.sh maintenance to the Tools people.

I couldn't find the add_minidebug () code upstream, so I am not sure where it is maintained by who. But the proposed patch is simple (against my installed version of find-debuginfo.sh):

--- find-debuginfo.sh.orig	2014-01-14 18:20:52.969317796 +0100
+++ find-debuginfo.sh.new	2014-01-14 18:23:40.572867474 +0100
@@ -149,7 +149,10 @@
   # in the normal symbol table
   nm -D "$binary" --format=posix --defined-only | awk '{ print $1 }' | sort > "$dynsyms"
   # Extract all the text (i.e. function) symbols from the debuginfo 
-  nm "$debuginfo" --format=posix --defined-only | awk '{ if ($2 == "T" || $2 == "t") print $1 }' | sort > "$funcsyms"
+  # Use format sysv to make sure we can match against the actual ELF FUNC
+  # symbol type. The binutils nm posix format symbol type chars are
+  # ambigous for architectures that might use function descriptors.
+  nm "$debuginfo" --format=sysv --defined-only | awk -F \| '{ if ($4 ~ "FUNC") print $1 }' | sort > "$funcsyms"
   # Keep all the function symbols not already in the dynamic symbol table
   comm -13 "$dynsyms" "$funcsyms" > "$keep_symbols"
   # Copy the full debuginfo, keeping only a minumal set of symbols and removing some unnecessary sections

This of course needs some testing to make sure it actually works correctly on ppc64 and doesn't change the .gnu_debugdata section on other architectures.
Comment 13 Martin Milata 2014-01-15 10:48:20 EST
(In reply to Mark Wielaard from comment #11)
> I couldn't find the add_minidebug () code upstream, so I am not sure where
> it is maintained by who. But the proposed patch is simple (against my
> installed version of find-debuginfo.sh):

AFAIK the code lives as a patch in dist-git.
Comment 16 Panu Matilainen 2014-01-16 07:53:38 EST
Fixed in rpm-4.11.1-11.el7 (by the patch from comment #11)
Comment 18 Alexander Larsson 2014-01-20 04:49:22 EST
The patch looks good to me.
Comment 20 Ludek Smid 2014-06-13 07:03:03 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.