Bug 554639 - opreport does not seem to support separate debuginfo
Summary: opreport does not seem to support separate debuginfo
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: oprofile
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: William Cohen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 591538
TreeView+ depends on / blocked
 
Reported: 2010-01-12 08:39 UTC by r6144
Modified: 2010-05-12 14:21 UTC (History)
1 user (show)

Fixed In Version: oprofile-0.9.6-5.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 591538 (view as bug list)
Environment:
Last Closed: 2010-04-22 22:53:29 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
The aforementioned patch in 0.9.6 (3.43 KB, patch)
2010-01-23 07:45 UTC, r6144
no flags Details | Diff
Workaround that simply ignores SEC_LOAD (307 bytes, patch)
2010-01-23 08:08 UTC, r6144
no flags Details | Diff
Reverted a patch for overlay symbols for Cell SPE applications (3.31 KB, patch)
2010-04-07 15:37 UTC, William Cohen
no flags Details | Diff

Description r6144 2010-01-12 08:39:38 UTC
Description of problem:
oprofile does not seem to understand separate debuginfo.  For example, even after I install scim-bridge-debuginfo, opreport -l does not show detailed symbol information from scim-bridge.

There seems to be a related fix (the patch changing bfd_support.*) in oprofile 0.9.6.  However, the resulting opreport does not work at all on my kernel, either when I rebuild and install the entire oprofile-0.9.6-2.fc13.src.rpm, or backport just the bfd_support patches to 0.9.5-4.fc12.

Overall, the support of separate debuginfo in libbfd-based applications still seems to be quite messy.  I can't even persuade addr2info to make use of the information, although eu-addr2info does.

Version-Release number of selected component (if applicable):
kernel-2.6.31.9-174.fc12.x86_64
oprofile-0.9.5-4.fc12.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install the debuginfo package of some user-space application or library, e.g. scim-bridge
2. Use oprofile to collect some samples in the application.
3. Run opreport -l
  
Actual results:
Only the executable or library names are shown in the "symbol name" column, even when the relevant debuginfo has been installed.

Expected results:
The relevant symbol name should be shown.

Additional info:

Comment 1 r6144 2010-01-23 07:45:03 UTC
Created attachment 386296 [details]
The aforementioned patch in 0.9.6

This is the aforementioned patch in 0.9.6.  It seems to be assuming that the corresponding sections have the same number in the executable/library and in the debuginfo file.  However, this is not true for executables on Fedora 12.  For example, in scim-bridge-0.4.16-2.fc12.x86_64, .dynstr is section 30 in /usr/bin/scim-bridge according to readelf -SW, but it is section 6 in /usr/lib/debug/usr/bin/scim-bridge.debug.

Comment 2 r6144 2010-01-23 08:08:38 UTC
Created attachment 386297 [details]
Workaround that simply ignores SEC_LOAD

This patch to 0.9.5 simply removes the SEC_LOAD check in interesting_symbol().  It seems to work for me, although I'm not sure of its implications.

(SEC_LOAD is usually not set for sections in the .debug file because the sections are NOBITS there; see bfd/elf.c:_bfd_elf_make_section_from_shdr().  This makes opreport etc. ignore all symbols from the .debug file.  For certain libraries such as libc, the library itself is not stripped, so this inability to load separate debuginfo may not be as apparent.)

Comment 3 Fedora Update System 2010-04-05 20:22:07 UTC
oprofile-0.9.6-2.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/oprofile-0.9.6-2.fc12

Comment 4 r6144 2010-04-06 03:26:19 UTC
oprofile-0.9.6-2.fc12 is broken in the way described in Comment 1.  opreport simply dies with "opreport error: profile_t::samples_range(): start > end something wrong with kernel or module layout ?" whenever the profile includes any executable with separate debuginfo, so the end-user experience is even worse.

For example, to reproduce this bug with gcc:

1. Install oprofile-0.9.6-2.fc12

2. Install gcc-4.4.3-4.fc12.x86_64 and gcc-debuginfo-4.4.3-4.fc12.x86_64

3. Run the following commands:

# readelf -SW /usr/libexec/gcc/x86_64-redhat-linux/4.4.3/cc1 | grep dynstr
  [19] .dynstr           STRTAB          0000000000c5459c 85459c 000849 00   A  0   0  1
# readelf -SW /usr/lib/debug/usr/libexec/gcc/x86_64-redhat-linux/4.4.3/cc1.debug | grep dynstr
  [ 6] .dynstr           NOBITS          0000000000401818 000248 00082d 00   A  0   0  1
# opcontrol --reset && opcontrol --start && sleep 20 && opcontrol --stop
Signalling daemon... done
Profiler running.

(Run gcc in the meantime, e.g. by building something)

Stopping profiling.

# opreport -l | less
...
opreport error: profile_t::samples_range(): start > end something wrong with kernel or module layout ?
please report problem to oprofile-list.net

# opreport.my -l | less
(This is the 0.9.5-4 opreport with the patch in comment 2, which appears to work)
...
samples  %        image name               app name                 symbol name
9568      1.5946  libc-2.11.1.so           cc1                      _int_malloc
8816      1.4693  cc1                      cc1                      _cpp_lex_direct
8687      1.4478  cc1                      cc1                      ht_lookup_with_hash
6518      1.0863  cc1                      cc1                      htab_find_slot_with_hash
...

Comment 5 William Cohen 2010-04-06 19:21:39 UTC
The eliminating the SEC_LOAD check on 0.9.5 seems to remove the symptom. However, the same patch doesn't help with oprofile-0.9.6. This is likely not dealing with the root cause.

Comment 6 r6144 2010-04-07 03:49:36 UTC
Well, since the section numbers can differ between the image and the debuginfo files, I guess we have to fix bfd_info::translate_debuginfo_syms() so that it compares section names instead.

An alternative is to remove the call to translate_debuginfo_syms() together with the SEC_LOAD check, thus reverting to 0.9.5 behavior.  Since translate_debuginfo_syms() seems to be used elsewhere, though, it isn't a thorough fix.

Comment 7 William Cohen 2010-04-07 13:48:20 UTC
The difference is the section numbers is unlikely the problem. I compiled the stock oprofile 0.9.4 with the oprofile-basename.patch configuration patch. The oprofile 0.9.4 is able to read separate debuginfo and display the results.

Running opreport with the "--verbose=all" to get a better idea of what is going on.

For the good run (oprofile 0.9.4) see the following in the output:

...
bfd_info::get_symbols() for /usr/lib/debug/bin/rm.debug
bfd_get_symtab_upper_bound: 2688
bfd_canonicalize_symtab: 335
number of symbols before filtering 134
number of symbols now 134
...
symbol atexit, value 7270
start 8d90, end 8db0
in section .text, filepos 1b20
symbol __do_global_ctors_aux, value 7290
start 8db0, end 8de8
in section .text, filepos 1b20
symbol _fini, value 0
start 8de8, end e058
in section .fini, filepos 8de8

For the bad run (upstream oprofile) see the following in the output:

...
now loading: /usr/lib/debug/bin/rm.debug
bfd_info::get_symbols() for /usr/lib/debug/bin/rm.debug
bfd_get_symtab_upper_bound: 2688
bfd_canonicalize_symtab: 335
number of symbols before filtering 151
number of symbols now 151
...
symbol __libc_csu_init, value 71e0
start 88a8, end 8938
in section .plt, filepos 16c8
symbol atexit, value 7270
start 8938, end 8958
in section .plt, filepos 16c8
symbol __do_global_ctors_aux, value 7290
start 8958, end 1b18
in section .plt, filepos 16c8

That values for symbol __do_global_ctors_aux end of 1b18 looks pretty suspect.

Comment 8 William Cohen 2010-04-07 15:37:50 UTC
Created attachment 405000 [details]
Reverted a patch for overlay symbols for Cell SPE applications

Reverting this patch allow oprofile to work with the separate debuginfo file. The patch changes how the end and the resulting size is computed.

Adds the following function:

unsigned long op_bfd_symbol::symbol_endpos(void) const
{
	return bfd_symbol->section->filepos + bfd_symbol->section->size;
}

The uses of this function is causing oprofile to fail.

Comment 9 Fedora Update System 2010-04-09 01:22:47 UTC
oprofile-0.9.6-2.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update oprofile'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/oprofile-0.9.6-2.fc12

Comment 10 William Cohen 2010-04-14 14:45:12 UTC
There has been discussion on the oprofile mailing list about this problem.
A pointer to the oprofile email containing a proposed patch for this:

http://marc.info/?l=oprofile-list&m=127119391506908&w=2

Comment 11 Fedora Update System 2010-04-14 19:06:45 UTC
oprofile-0.9.6-5.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/oprofile-0.9.6-5.fc12

Comment 12 Fedora Update System 2010-04-16 23:32:08 UTC
oprofile-0.9.6-5.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update oprofile'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/oprofile-0.9.6-5.fc12

Comment 13 Fedora Update System 2010-04-22 22:53:23 UTC
oprofile-0.9.6-5.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.