Bug 746823 - oprofile and sysprof fail to read symbol files [regression]
Summary: oprofile and sysprof fail to read symbol files [regression]
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: oprofile
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-17 20:51 UTC by Clemens Eisserer
Modified: 2013-02-26 15:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 896555 (view as bug list)
Environment:
Last Closed: 2013-02-26 15:10:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Clemens Eisserer 2011-10-17 20:51:30 UTC
Description of problem:

Sysprof and Oprofile fail to read symbol files, even with debuginfo packages installed. 
Even with debuginfo-packages installed, the information presented is equal to what is shown without those packages installed - which makes both system-wide profiles much less useful.

Worked with Fedora-14, regressed with Fedora 15 and is still broken in Fedora 16 Beta1.

Version-Release number of selected component (if applicable):

Fedora 16 Beta
sysprof-1.1.8-1.fc16.i686
oprofile-0.9.6-21.fc16.i686
binutils-2.21.53.0.1-2.fc16.i686
kernel-3.1.0-0.rc9.git0.0.fc16.i686

How reproducible: Always


Steps to Reproduce:
1. Install all required debuginfo-packages
2. Start profiling session with oprofile or sysprof
3. 
  
Actual results:
Symbol files not read, therefor the information presented is quite sparse (e.g. no exact call-trees)


Expected results: Both, oprofile and sysprof should be able to read the installed symbol files

Additional info:

Comment 1 William Cohen 2011-10-18 02:12:37 UTC
This might be a problem with oprofile AND sysprof, but it seem likely that binutils is culprit because they both fail. I was able to replicate this on f15 (AND f14) x86-64 vm with oprofile.  I was able to replicate with the following commands run as root:

modprobe oprofile timer=1
opcontrol --reset
opcontrol --setup --no-vmlinux --separate=library
opcontrol --start
# run things for a while to get some samples for /usr/bin/gnome-terminal
opcontrol --shutdown
debuginfo-install gnome-terminal -y
opreport -t1 -l /usr/bin/gnome-terminal

Output should have breakdown of time spent in various routines in various libraries. However, get library info lumped together in opreport:


$ opreport -t1 -l /usr/bin/gnome-terminal
Overflow stats not available
warning: [vdso] (tgid:1642 range:0x7fff387ff000-0x7fff38800000) could not be found.
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        image name               symbol name
656      23.5041  libcairo.so.2.11000.2    /usr/lib64/libcairo.so.2.11000.2
527      18.8821  libglib-2.0.so.0.2800.8  /lib64/libglib-2.0.so.0.2800.8
347      12.4328  libgobject-2.0.so.0.2800.8 /lib64/libgobject-2.0.so.0.2800.8
259       9.2798  libgtk-3.so.0.0.12       /usr/lib64/libgtk-3.so.0.0.12
198       7.0942  libvte2_90.so.9.2800.1   /usr/lib64/libvte2_90.so.9.2800.1
156       5.5894  libgdk-3.so.0.0.12       /usr/lib64/libgdk-3.so.0.0.12
53        1.8990  libX11.so.6.3.0          /usr/lib64/libX11.so.6.3.0
41        1.4690  libc-2.14.so             _int_malloc
37        1.3257  libc-2.14.so             __strcmp_sse2
35        1.2540  libpthread-2.14.so       pthread_mutex_unlock
32        1.1465  libpthread-2.14.so       pthread_mutex_lock

Even though the debuginfo are installed
$ rpm -q cairo cairo-debuginfo glib2 glib2-debuginfo 
cairo-1.10.2-3.fc15.x86_64
cairo-debuginfo-1.10.2-3.fc15.x86_64
glib2-2.28.8-1.fc15.x86_64
glib2-debuginfo-2.28.8-1.fc15.x86_64

Note that some libraries do have functions, such as libc-2.14.so and libpthread-2.14.so. Looking through the output of "opreport --verbose=all -t1 -l /usr/bin/gnome-terminal" see that the libcairo.so debuginfo is found but it looks like the symbols are being filtered out:

op_bfd ctor for /usr/lib64/libcairo.so.2.11000.2
bfd_info::get_symbols() for /usr/lib64/libcairo.so.2.11000.2
bfd_get_symtab_upper_bound: 8
bfd_canonicalize_symtab: 0
fetching .gnu_debuglink section
.gnu_debuglink section has size 20
.gnu_debuglink filename is libcairo.so.2.11000.2.debug
looking for debugging file libcairo.so.2.11000.2.debug with crc32 = 31da1180
found /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug with crc32 = 31da1180
now loading: /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug
bfd_info::get_symbols() for /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug
bfd_get_symtab_upper_bound: 21456
bfd_canonicalize_symtab: 2681
number of symbols before filtering 1
number of symbols now 1
start_offset is now 0
symbol /usr/lib64/libcairo.so.2.11000.2, value 0
start 0, end a9028

Comment 2 Nick Clifton 2011-10-18 11:59:24 UTC
Hi Will,

  This probably is a binutils bug, but I am going to need your help to track it down...

  I do not think that the problem is with the symbol filtering, since the verbose log is reporting that the number of symbols *before* filtering is 1.  I do notice something rather suspicious however.  In the cases where the number of symbols before filtering is 1 then the log either reports:

  failed to process separate debug file 
  number of symbols before filtering 1

Or else (to take one example):

  bfd_info::get_symbols() for /usr/lib/debug/lib64/libgio-2.0.so.0.2800.8.debug
  bfd_get_symtab_upper_bound: 50960
  bfd_canonicalize_symtab: 6369
  number of symbols before filtering 1

Which implies that there are 50960 bytes of data in the debug info file and that after reading this in there are 6369 symbols that have been processed.  So why does the symbol count then appear to be reset to 1 ?

Strangely in the cases where, I presume, the symbols are successfully loaded the log reports:

  bfd_info::get_symbols() for /usr/lib/debug/lib64/libc-2.14.so.debug
  bfd_get_symtab_upper_bound: 0
  number of symbols before filtering 3308

Which suggests that there was no data found in the debug info file and that the symbols have been (magically) found elsewhere.

So, in the oprofile sources, what happens between bfd_info::get_symbols() and op_bfd::add_symbols() ?

Cheers
  Nick

Comment 3 William Cohen 2011-10-18 14:42:52 UTC
The source code for oprofile for the oprofile in fc15 is pretty much same as upstream in libutils++. Can view the code in the git tree at:

http://oprofile.git.sourceforge.net/git/gitweb.cgi?p=oprofile/oprofile;a=tree;f=libutil%2B%2B;

Adding some annotation where the output messages to describe where
they are coming from and what is going on:


now loading: /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug
bfd_info::get_symbols() for
/usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug
bfd_get_symtab_upper_bound: 21456
  -above from libutils++/bfd_support.cpp bfd_info::get_symbols()
  -bfd_canonicalize_symtab(abfd, syms.get()) call between this and the next.

bfd_canonicalize_symtab: 2681
  -also from libutils++/bfd_support.cpp bfd_info::get_symbols() also
  -a call is made to libutils++/bfd_support.cpp:translate_debuginfo_syms()
       in get_symbols() and that might be affecting the  symbols

number of symbols before filtering 1
   -from libutils++/op_bfd.cpp op_bfd::add_symbols


number of symbols now 1
start_offset is now 0
symbol /usr/lib64/libcairo.so.2.11000.2, value 0
start 0, end a9028

Time to look and see how bfd_info::translate_debuginfo_syms() operates.

Comment 4 William Cohen 2011-10-18 15:36:06 UTC
It looks like things are not working well with prelink'ed files.

Undid the prelink on the f14 machine with:

prelink -u --all

Then reran the reproducer. Got output for the various functions in libcairo.so.2.11000.2  and other shared libs:



# opreport -l -t1 /usr/bin/gnome-terminal 
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        image name               symbol name
18        4.0449  libcairo.so.2.11000.2    _cairo_bentley_ottmann_tessellate_polygon
9         2.0225  ld-2.13.so               do_lookup_x
9         2.0225  libglib-2.0.so.0.2600.0  g_atomic_pointer_get
7         1.5730  libc-2.13.so             malloc
7         1.5730  libglib-2.0.so.0.2600.0  g_main_context_check
7         1.5730  libvte.so.9.2600.1       vte_terminal_draw_rows.clone.6
6         1.3483  libvte.so.9.2600.1       vte_terminal_determine_colors_internal
5         1.1236  libc-2.13.so             __strcmp_sse2
5         1.1236  libc-2.13.so             _int_free
5         1.1236  libc-2.13.so             free
5         1.1236  libglib-2.0.so.0.2600.0  g_string_insert_unichar
5         1.1236  libgobject-2.0.so.0.2600.0 g_type_check_instance_cast
5         1.1236  libgobject-2.0.so.0.2600.0 g_type_check_instance_is_a

Comment 5 William Cohen 2011-10-19 14:51:32 UTC
Clemens, the work around at the moment is to undo the prelinking with:

prelink -u --all

Note that the prelinking changes the files and oprofile is going to notice that, so already loaded executable are not going to get oprofile samples against them because of difference beteen what is load and what is in the file. Maybe reboot the machine to ensure that everything is running the non-prelinked versions of the shared libraries.

Still need to find out why oprofile and sysprof getting messed up because of the prelink of the shared libraries.

Comment 6 William Cohen 2011-10-19 17:56:43 UTC
Seems that the interesting_symbol() function is filtering out all the symbols in the prelinked shared libs. Seems that the following statement in interesting_symbol() is filtering out everything in the prelinked libraries:

	if (!(sym->section->flags & SEC_LOAD))
		return false;

Used gdb with conditional breakpoints to stop opreport in the same place, op_bfd::get_symbols() reading in /usr/lib64/libcairo.so.2.11000.2 looking at "call_gmon_start" symbol just before the call to interesting_symbol at line 232.  

for problem (prelinked) one:
(gdb) print *dbfd.syms[i]
$33 = {the_bfd = 0x8981b0, name = 0x7dd821 "call_gmon_start", value = 0, 
  flags = 9, section = 0x88e5c8, udata = {p = 0x0, i = 0}}
(gdb) print *dbfd.syms[i]->section
$34 = {name = 0x8c1cae ".text", id = 1071, index = 10, next = 0x88e6f0, 
  prev = 0x88e4a0, flags = 25, user_set_vma = 1, linker_mark = 0, 
  linker_has_input = 0, gc_mark = 0, segment_mark = 0, sec_info_type = 0, 
  use_rela_p = 1, sec_flg0 = 0, sec_flg1 = 0, sec_flg2 = 0, sec_flg3 = 0, 
  sec_flg4 = 0, sec_flg5 = 0, vma = 44832, lma = 44832, size = 499832, 
  rawsize = 0, relax = 0x0, relax_count = 0, output_offset = 0, 
  output_section = 0x0, alignment_power = 4, relocation = 0x0, 
  orelocation = 0x0, reloc_count = 0, filepos = 448, rel_filepos = 0, 
  line_filepos = 0, userdata = 0x0, contents = 0x0, lineno = 0x0, 
  lineno_count = 0, entsize = 0, kept_section = 0x0, moving_line_filepos = 0, 
  target_index = 0, used_by_bfd = 0x888270, constructor_chain = 0x0, 
  owner = 0x8981b0, symbol = 0x888380, symbol_ptr_ptr = 0x88e6b8, map_head = {
    link_order = 0x0, s = 0x0}, map_tail = {link_order = 0x0, s = 0x0}}
flag in this case would be: 18, SEC_CODE|SEC_READONLY

for the good (non-prelinked) one:
for good one:
(gdb) print *dbfd.syms[i]
$35 = {the_bfd = 0x7df0c0, name = 0x7f3071 "call_gmon_start", value = 0, 
  flags = 9, section = 0x861e08, udata = {p = 0x0, i = 0}}
(gdb) print *dbfd.syms[i]->section
$36 = {name = 0x7db4ef ".text", id = 1182, index = 10, next = 0x861f30, 
  prev = 0x861ce0, flags = 283, user_set_vma = 1, linker_mark = 0, 
  linker_has_input = 0, gc_mark = 0, segment_mark = 0, sec_info_type = 0, 
  use_rela_p = 1, sec_flg0 = 0, sec_flg1 = 0, sec_flg2 = 0, sec_flg3 = 0, 
  sec_flg4 = 0, sec_flg5 = 0, vma = 44832, lma = 44832, size = 499832, 
  rawsize = 0, relax = 0x0, relax_count = 0, output_offset = 0, 
  output_section = 0x0, alignment_power = 4, relocation = 0x0, 
  orelocation = 0x0, reloc_count = 0, filepos = 44832, rel_filepos = 0, 
  line_filepos = 0, userdata = 0x0, contents = 0x0, lineno = 0x0, 
  lineno_count = 0, entsize = 0, kept_section = 0x0, moving_line_filepos = 0, 
  target_index = 0, used_by_bfd = 0x89dc70, constructor_chain = 0x0, 
  owner = 0x7df0c0, symbol = 0x89dd80, symbol_ptr_ptr = 0x861ef8, map_head = {
    link_order = 0x0, s = 0x0}, map_tail = {link_order = 0x0, s = 0x0}}
(gdb) print/x (*dbfd.syms[i]->section).flags
$37 = 0x11b
SEC_HAS_CONTENTS|SEC_CODE|SEC_READONLY|SEC_LOAD|SEC_ALLOC


I looked at the eu-readelf -A output of both versions of /usr/lib64/libcairo.so.2.11000.2 didn't see any differences in section headers for .text
How does bfd determine those flags?

Comment 7 Nick Clifton 2011-10-20 16:11:34 UTC
Hi Will,

  OK, how's this for a theory ...  In bfd_info::translate_debuginfo_syms() you have this bit of code to match symbols in the debuginfo file to their sections in the original library:

	if ((*it).second->vma == sym->section->vma) 
          {
 	    matched_section = (*it).second;
	    break;

  But, in prelinked libraries, the section addresses are all wildly different from the addresses that were in their original (non-prelinked) versions.  So this matching fails and the debug info symbols are not successfully translated.

  (I note that translate_debuginfo_syms() does not check to see if all of the symbols have been translated).

  So - assuming this theory is correct, I see three possible solutions:

  1. When prelinking a library, prelink the debuginfo as well.
  
     [Not really practical]

  2. If a matching section cannot be found for a debug info symbol then use a heuristic to guess at the section to which it ought to belong.

     [Possible, but probably ugly, and likely to break in the future].

  3. If a matching section cannot be found for a debug info symbol then attach it to a fake loadable code section.

     [Not sure about this.  I think that there might still be a problem with the fact that address associated with the symbol taken from the debuginfo file does not match up with the addresses used by the runtime executable].

  4. Change interesting_symbol() to allow symbols in sections that do not have the SEC_LOAD attrtibute.

    [Not sure what the consequences would be]

Your thoughts ?

Cheers
  Nick

Comment 8 Jakub Jelinek 2011-10-20 16:24:55 UTC
(In reply to comment #7)
>   So - assuming this theory is correct, I see three possible solutions:
> 
>   1. When prelinking a library, prelink the debuginfo as well.
> 
>      [Not really practical]

And unimplementable.  The *-debuginfo package doesn't need to be installed when the library is prelinked, it can be installed later on.  gdb/systemtap etc. cope with this just fine, by using proper adjustments of the addresses in debug info.  Any program needs to do such adjustment anyway, even when nothing is prelinked, because shared libraries can be mmapped at any address.
But, instead of adjusting it by the difference between the base address where the shared library has been mmapped and first PT_LOAD segment p_vaddr in the shared library (which is usually 0 for prelinked shared libraries) you want to
adjust by the difference between the base address where the shared library has been mmapped and first PT_LOAD segment p_vaddr in the *.debug ELF file (which is usually, but not always, 0).

Comment 9 Maynard Johnson 2012-02-10 16:36:44 UTC
I posted a patch for this problem to the oprofile-list on Jan 6 (http://marc.info/?l=oprofile-list&m=132589204604609&w=2), but no one has tested and/or reviewed it yet.  Note that the subject line for that posting is not very descriptive.  A better description for the patch would be "Fix debuginfo processing in prelink environments".  Can anyone try this patch to see if it fixes the problem raised in this bug?  Thanks.

Comment 10 Brent Baude 2012-02-16 16:18:28 UTC
i bumped into this today trying to oprofile something on Fedora 16 ppc64.  Any chance we could resolve this?

Comment 11 Karsten Hopp 2012-02-16 16:31:20 UTC
William, Nick: could you please review the patch that was posted on the oprofile-list ?

Comment 12 Nick Clifton 2012-02-22 17:17:51 UTC
Hi Karsten,

  OK, it has been a long time since I looked at this bug, and I am not an expert on oprofile at all, but...

  1) Is the patch truly intended just for ppc64 debuginfo files ?  I thought that the problem was generic, not architecture specific.

  2) Where you remove the test of a sections's flags that made sure that it included SEC_CODE, I would add a comment saying non-code sections are being allowed.  Even just a reference to this BZ would be enough.

  3) Is the removal of the call to translate_debuginfo_syms() necessary ?
(Ie could this cause problems when separate debuginfo files are *not* being used ?)

Cheers
  Nick

Comment 13 Maynard Johnson 2012-02-22 18:55:46 UTC
(In reply to comment #12)
> Hi Karsten,
> 
>   OK, it has been a long time since I looked at this bug, and I am not an
> expert on oprofile at all, but...
> 
>   1) Is the patch truly intended just for ppc64 debuginfo files ?  I thought
> that the problem was generic, not architecture specific.
No, it is *not* ppc64-specific.  Sorry if there was some confusion.  I posted two patches to the oprofile-list on Jan 6: one that *was* ppc64-specific and another that was not.  It's the second patch that I referenced above in comment #9.  The patch name was a poor choice on my part.
> 
>   2) Where you remove the test of a sections's flags that made sure that it
> included SEC_CODE, I would add a comment saying non-code sections are being
> allowed.  Even just a reference to this BZ would be enough.
OK.
> 
>   3) Is the removal of the call to translate_debuginfo_syms() necessary ?
> (Ie could this cause problems when separate debuginfo files are *not* being
> used ?)
Some history:  The translate_debuginfo_syms function was added specifically for ppc64 for a problem I reported in Feb 2009.  Originally, oprofile did not filter out symbols based on the SEC_CODE flag.  That was added for another ppc64-specific problem I reported to the list in June 2009.  But as we soon found out, adding the SEC_CODE filter broke debuginfo processing for *all* architectures.  Modifying oprofile to call translate_debuginfo_syms for *all* architectures was done at the request of John Levon (OProfile creator) to solve that.  I was not in favor of that decision, and am still not.  The patch under review reverses these two decisions (the SEC_CODE filter and calling translate_debuginfo_syms for all architectures).  If this patch is accepted, it will open up that bug again (which I reported in June 2009.  But there's an alternate fix for that bug (instead of the SEC_CODE filter), which I would post to the list if this current patch is accepted.

So, the short answer to your question of whether it would cause problems to remove translate_debuginfo_syms is "no" -- it actually makes for simpler code for the non-ppc64 architectures.

Thanks so much for taking the time to look at the patch.  Please let me know if you have any further questions or if you're satisfied that I've addressed your concerns.
> 
> Cheers
>   Nick

Comment 14 Nick Clifton 2012-02-22 21:04:29 UTC
Hi Maynard,

  Ok, then, I am happy with the patch (modulo adding a comment).  Please note of course that I cannot approve the patch, since I am not an oprofile maintainer, but from a binutils point of view I am happy with it.

Cheers
  Nick

Comment 15 IBM Bug Proxy 2012-02-22 21:59:51 UTC
Tested this on my system and it resolved the reported problem.

Comment 17 Tomáš Trnka 2012-10-17 11:22:22 UTC
The problem with loading debuginfo symbols still exists in stock F17 oprofile-0.9.7-2.fc17. Applying the upstream fix (git commit ab45a0cc5d1cf522c1aef8f22ed512a9aae0dc1c applies cleanly) makes debuginfo symbols work again (at least on x86_64). Can anybody please push an update for oprofile including the fix? Or alternatively upgrading to oprofile 0.9.8 would solve the problem, too (in addition to bringing exciting new features like non-root profiling).

Comment 18 Fedora End Of Life 2013-01-16 20:51:40 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Fedora End Of Life 2013-02-26 15:11:03 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.