Description of problem: Sysprof and Oprofile fail to read symbol files, even with debuginfo packages installed. Even with debuginfo-packages installed, the information presented is equal to what is shown without those packages installed - which makes both system-wide profiles much less useful. Worked with Fedora-14, regressed with Fedora 15 and is still broken in Fedora 16 Beta1. Version-Release number of selected component (if applicable): Fedora 16 Beta sysprof-1.1.8-1.fc16.i686 oprofile-0.9.6-21.fc16.i686 binutils-2.21.53.0.1-2.fc16.i686 kernel-3.1.0-0.rc9.git0.0.fc16.i686 How reproducible: Always Steps to Reproduce: 1. Install all required debuginfo-packages 2. Start profiling session with oprofile or sysprof 3. Actual results: Symbol files not read, therefor the information presented is quite sparse (e.g. no exact call-trees) Expected results: Both, oprofile and sysprof should be able to read the installed symbol files Additional info:
This might be a problem with oprofile AND sysprof, but it seem likely that binutils is culprit because they both fail. I was able to replicate this on f15 (AND f14) x86-64 vm with oprofile. I was able to replicate with the following commands run as root: modprobe oprofile timer=1 opcontrol --reset opcontrol --setup --no-vmlinux --separate=library opcontrol --start # run things for a while to get some samples for /usr/bin/gnome-terminal opcontrol --shutdown debuginfo-install gnome-terminal -y opreport -t1 -l /usr/bin/gnome-terminal Output should have breakdown of time spent in various routines in various libraries. However, get library info lumped together in opreport: $ opreport -t1 -l /usr/bin/gnome-terminal Overflow stats not available warning: [vdso] (tgid:1642 range:0x7fff387ff000-0x7fff38800000) could not be found. CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 656 23.5041 libcairo.so.2.11000.2 /usr/lib64/libcairo.so.2.11000.2 527 18.8821 libglib-2.0.so.0.2800.8 /lib64/libglib-2.0.so.0.2800.8 347 12.4328 libgobject-2.0.so.0.2800.8 /lib64/libgobject-2.0.so.0.2800.8 259 9.2798 libgtk-3.so.0.0.12 /usr/lib64/libgtk-3.so.0.0.12 198 7.0942 libvte2_90.so.9.2800.1 /usr/lib64/libvte2_90.so.9.2800.1 156 5.5894 libgdk-3.so.0.0.12 /usr/lib64/libgdk-3.so.0.0.12 53 1.8990 libX11.so.6.3.0 /usr/lib64/libX11.so.6.3.0 41 1.4690 libc-2.14.so _int_malloc 37 1.3257 libc-2.14.so __strcmp_sse2 35 1.2540 libpthread-2.14.so pthread_mutex_unlock 32 1.1465 libpthread-2.14.so pthread_mutex_lock Even though the debuginfo are installed $ rpm -q cairo cairo-debuginfo glib2 glib2-debuginfo cairo-1.10.2-3.fc15.x86_64 cairo-debuginfo-1.10.2-3.fc15.x86_64 glib2-2.28.8-1.fc15.x86_64 glib2-debuginfo-2.28.8-1.fc15.x86_64 Note that some libraries do have functions, such as libc-2.14.so and libpthread-2.14.so. Looking through the output of "opreport --verbose=all -t1 -l /usr/bin/gnome-terminal" see that the libcairo.so debuginfo is found but it looks like the symbols are being filtered out: op_bfd ctor for /usr/lib64/libcairo.so.2.11000.2 bfd_info::get_symbols() for /usr/lib64/libcairo.so.2.11000.2 bfd_get_symtab_upper_bound: 8 bfd_canonicalize_symtab: 0 fetching .gnu_debuglink section .gnu_debuglink section has size 20 .gnu_debuglink filename is libcairo.so.2.11000.2.debug looking for debugging file libcairo.so.2.11000.2.debug with crc32 = 31da1180 found /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug with crc32 = 31da1180 now loading: /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug bfd_info::get_symbols() for /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug bfd_get_symtab_upper_bound: 21456 bfd_canonicalize_symtab: 2681 number of symbols before filtering 1 number of symbols now 1 start_offset is now 0 symbol /usr/lib64/libcairo.so.2.11000.2, value 0 start 0, end a9028
Hi Will, This probably is a binutils bug, but I am going to need your help to track it down... I do not think that the problem is with the symbol filtering, since the verbose log is reporting that the number of symbols *before* filtering is 1. I do notice something rather suspicious however. In the cases where the number of symbols before filtering is 1 then the log either reports: failed to process separate debug file number of symbols before filtering 1 Or else (to take one example): bfd_info::get_symbols() for /usr/lib/debug/lib64/libgio-2.0.so.0.2800.8.debug bfd_get_symtab_upper_bound: 50960 bfd_canonicalize_symtab: 6369 number of symbols before filtering 1 Which implies that there are 50960 bytes of data in the debug info file and that after reading this in there are 6369 symbols that have been processed. So why does the symbol count then appear to be reset to 1 ? Strangely in the cases where, I presume, the symbols are successfully loaded the log reports: bfd_info::get_symbols() for /usr/lib/debug/lib64/libc-2.14.so.debug bfd_get_symtab_upper_bound: 0 number of symbols before filtering 3308 Which suggests that there was no data found in the debug info file and that the symbols have been (magically) found elsewhere. So, in the oprofile sources, what happens between bfd_info::get_symbols() and op_bfd::add_symbols() ? Cheers Nick
The source code for oprofile for the oprofile in fc15 is pretty much same as upstream in libutils++. Can view the code in the git tree at: http://oprofile.git.sourceforge.net/git/gitweb.cgi?p=oprofile/oprofile;a=tree;f=libutil%2B%2B; Adding some annotation where the output messages to describe where they are coming from and what is going on: now loading: /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug bfd_info::get_symbols() for /usr/lib/debug/usr/lib64/libcairo.so.2.11000.2.debug bfd_get_symtab_upper_bound: 21456 -above from libutils++/bfd_support.cpp bfd_info::get_symbols() -bfd_canonicalize_symtab(abfd, syms.get()) call between this and the next. bfd_canonicalize_symtab: 2681 -also from libutils++/bfd_support.cpp bfd_info::get_symbols() also -a call is made to libutils++/bfd_support.cpp:translate_debuginfo_syms() in get_symbols() and that might be affecting the symbols number of symbols before filtering 1 -from libutils++/op_bfd.cpp op_bfd::add_symbols number of symbols now 1 start_offset is now 0 symbol /usr/lib64/libcairo.so.2.11000.2, value 0 start 0, end a9028 Time to look and see how bfd_info::translate_debuginfo_syms() operates.
It looks like things are not working well with prelink'ed files. Undid the prelink on the f14 machine with: prelink -u --all Then reran the reproducer. Got output for the various functions in libcairo.so.2.11000.2 and other shared libs: # opreport -l -t1 /usr/bin/gnome-terminal CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name symbol name 18 4.0449 libcairo.so.2.11000.2 _cairo_bentley_ottmann_tessellate_polygon 9 2.0225 ld-2.13.so do_lookup_x 9 2.0225 libglib-2.0.so.0.2600.0 g_atomic_pointer_get 7 1.5730 libc-2.13.so malloc 7 1.5730 libglib-2.0.so.0.2600.0 g_main_context_check 7 1.5730 libvte.so.9.2600.1 vte_terminal_draw_rows.clone.6 6 1.3483 libvte.so.9.2600.1 vte_terminal_determine_colors_internal 5 1.1236 libc-2.13.so __strcmp_sse2 5 1.1236 libc-2.13.so _int_free 5 1.1236 libc-2.13.so free 5 1.1236 libglib-2.0.so.0.2600.0 g_string_insert_unichar 5 1.1236 libgobject-2.0.so.0.2600.0 g_type_check_instance_cast 5 1.1236 libgobject-2.0.so.0.2600.0 g_type_check_instance_is_a
Clemens, the work around at the moment is to undo the prelinking with: prelink -u --all Note that the prelinking changes the files and oprofile is going to notice that, so already loaded executable are not going to get oprofile samples against them because of difference beteen what is load and what is in the file. Maybe reboot the machine to ensure that everything is running the non-prelinked versions of the shared libraries. Still need to find out why oprofile and sysprof getting messed up because of the prelink of the shared libraries.
Seems that the interesting_symbol() function is filtering out all the symbols in the prelinked shared libs. Seems that the following statement in interesting_symbol() is filtering out everything in the prelinked libraries: if (!(sym->section->flags & SEC_LOAD)) return false; Used gdb with conditional breakpoints to stop opreport in the same place, op_bfd::get_symbols() reading in /usr/lib64/libcairo.so.2.11000.2 looking at "call_gmon_start" symbol just before the call to interesting_symbol at line 232. for problem (prelinked) one: (gdb) print *dbfd.syms[i] $33 = {the_bfd = 0x8981b0, name = 0x7dd821 "call_gmon_start", value = 0, flags = 9, section = 0x88e5c8, udata = {p = 0x0, i = 0}} (gdb) print *dbfd.syms[i]->section $34 = {name = 0x8c1cae ".text", id = 1071, index = 10, next = 0x88e6f0, prev = 0x88e4a0, flags = 25, user_set_vma = 1, linker_mark = 0, linker_has_input = 0, gc_mark = 0, segment_mark = 0, sec_info_type = 0, use_rela_p = 1, sec_flg0 = 0, sec_flg1 = 0, sec_flg2 = 0, sec_flg3 = 0, sec_flg4 = 0, sec_flg5 = 0, vma = 44832, lma = 44832, size = 499832, rawsize = 0, relax = 0x0, relax_count = 0, output_offset = 0, output_section = 0x0, alignment_power = 4, relocation = 0x0, orelocation = 0x0, reloc_count = 0, filepos = 448, rel_filepos = 0, line_filepos = 0, userdata = 0x0, contents = 0x0, lineno = 0x0, lineno_count = 0, entsize = 0, kept_section = 0x0, moving_line_filepos = 0, target_index = 0, used_by_bfd = 0x888270, constructor_chain = 0x0, owner = 0x8981b0, symbol = 0x888380, symbol_ptr_ptr = 0x88e6b8, map_head = { link_order = 0x0, s = 0x0}, map_tail = {link_order = 0x0, s = 0x0}} flag in this case would be: 18, SEC_CODE|SEC_READONLY for the good (non-prelinked) one: for good one: (gdb) print *dbfd.syms[i] $35 = {the_bfd = 0x7df0c0, name = 0x7f3071 "call_gmon_start", value = 0, flags = 9, section = 0x861e08, udata = {p = 0x0, i = 0}} (gdb) print *dbfd.syms[i]->section $36 = {name = 0x7db4ef ".text", id = 1182, index = 10, next = 0x861f30, prev = 0x861ce0, flags = 283, user_set_vma = 1, linker_mark = 0, linker_has_input = 0, gc_mark = 0, segment_mark = 0, sec_info_type = 0, use_rela_p = 1, sec_flg0 = 0, sec_flg1 = 0, sec_flg2 = 0, sec_flg3 = 0, sec_flg4 = 0, sec_flg5 = 0, vma = 44832, lma = 44832, size = 499832, rawsize = 0, relax = 0x0, relax_count = 0, output_offset = 0, output_section = 0x0, alignment_power = 4, relocation = 0x0, orelocation = 0x0, reloc_count = 0, filepos = 44832, rel_filepos = 0, line_filepos = 0, userdata = 0x0, contents = 0x0, lineno = 0x0, lineno_count = 0, entsize = 0, kept_section = 0x0, moving_line_filepos = 0, target_index = 0, used_by_bfd = 0x89dc70, constructor_chain = 0x0, owner = 0x7df0c0, symbol = 0x89dd80, symbol_ptr_ptr = 0x861ef8, map_head = { link_order = 0x0, s = 0x0}, map_tail = {link_order = 0x0, s = 0x0}} (gdb) print/x (*dbfd.syms[i]->section).flags $37 = 0x11b SEC_HAS_CONTENTS|SEC_CODE|SEC_READONLY|SEC_LOAD|SEC_ALLOC I looked at the eu-readelf -A output of both versions of /usr/lib64/libcairo.so.2.11000.2 didn't see any differences in section headers for .text How does bfd determine those flags?
Hi Will, OK, how's this for a theory ... In bfd_info::translate_debuginfo_syms() you have this bit of code to match symbols in the debuginfo file to their sections in the original library: if ((*it).second->vma == sym->section->vma) { matched_section = (*it).second; break; But, in prelinked libraries, the section addresses are all wildly different from the addresses that were in their original (non-prelinked) versions. So this matching fails and the debug info symbols are not successfully translated. (I note that translate_debuginfo_syms() does not check to see if all of the symbols have been translated). So - assuming this theory is correct, I see three possible solutions: 1. When prelinking a library, prelink the debuginfo as well. [Not really practical] 2. If a matching section cannot be found for a debug info symbol then use a heuristic to guess at the section to which it ought to belong. [Possible, but probably ugly, and likely to break in the future]. 3. If a matching section cannot be found for a debug info symbol then attach it to a fake loadable code section. [Not sure about this. I think that there might still be a problem with the fact that address associated with the symbol taken from the debuginfo file does not match up with the addresses used by the runtime executable]. 4. Change interesting_symbol() to allow symbols in sections that do not have the SEC_LOAD attrtibute. [Not sure what the consequences would be] Your thoughts ? Cheers Nick
(In reply to comment #7) > So - assuming this theory is correct, I see three possible solutions: > > 1. When prelinking a library, prelink the debuginfo as well. > > [Not really practical] And unimplementable. The *-debuginfo package doesn't need to be installed when the library is prelinked, it can be installed later on. gdb/systemtap etc. cope with this just fine, by using proper adjustments of the addresses in debug info. Any program needs to do such adjustment anyway, even when nothing is prelinked, because shared libraries can be mmapped at any address. But, instead of adjusting it by the difference between the base address where the shared library has been mmapped and first PT_LOAD segment p_vaddr in the shared library (which is usually 0 for prelinked shared libraries) you want to adjust by the difference between the base address where the shared library has been mmapped and first PT_LOAD segment p_vaddr in the *.debug ELF file (which is usually, but not always, 0).
I posted a patch for this problem to the oprofile-list on Jan 6 (http://marc.info/?l=oprofile-list&m=132589204604609&w=2), but no one has tested and/or reviewed it yet. Note that the subject line for that posting is not very descriptive. A better description for the patch would be "Fix debuginfo processing in prelink environments". Can anyone try this patch to see if it fixes the problem raised in this bug? Thanks.
i bumped into this today trying to oprofile something on Fedora 16 ppc64. Any chance we could resolve this?
William, Nick: could you please review the patch that was posted on the oprofile-list ?
Hi Karsten, OK, it has been a long time since I looked at this bug, and I am not an expert on oprofile at all, but... 1) Is the patch truly intended just for ppc64 debuginfo files ? I thought that the problem was generic, not architecture specific. 2) Where you remove the test of a sections's flags that made sure that it included SEC_CODE, I would add a comment saying non-code sections are being allowed. Even just a reference to this BZ would be enough. 3) Is the removal of the call to translate_debuginfo_syms() necessary ? (Ie could this cause problems when separate debuginfo files are *not* being used ?) Cheers Nick
(In reply to comment #12) > Hi Karsten, > > OK, it has been a long time since I looked at this bug, and I am not an > expert on oprofile at all, but... > > 1) Is the patch truly intended just for ppc64 debuginfo files ? I thought > that the problem was generic, not architecture specific. No, it is *not* ppc64-specific. Sorry if there was some confusion. I posted two patches to the oprofile-list on Jan 6: one that *was* ppc64-specific and another that was not. It's the second patch that I referenced above in comment #9. The patch name was a poor choice on my part. > > 2) Where you remove the test of a sections's flags that made sure that it > included SEC_CODE, I would add a comment saying non-code sections are being > allowed. Even just a reference to this BZ would be enough. OK. > > 3) Is the removal of the call to translate_debuginfo_syms() necessary ? > (Ie could this cause problems when separate debuginfo files are *not* being > used ?) Some history: The translate_debuginfo_syms function was added specifically for ppc64 for a problem I reported in Feb 2009. Originally, oprofile did not filter out symbols based on the SEC_CODE flag. That was added for another ppc64-specific problem I reported to the list in June 2009. But as we soon found out, adding the SEC_CODE filter broke debuginfo processing for *all* architectures. Modifying oprofile to call translate_debuginfo_syms for *all* architectures was done at the request of John Levon (OProfile creator) to solve that. I was not in favor of that decision, and am still not. The patch under review reverses these two decisions (the SEC_CODE filter and calling translate_debuginfo_syms for all architectures). If this patch is accepted, it will open up that bug again (which I reported in June 2009. But there's an alternate fix for that bug (instead of the SEC_CODE filter), which I would post to the list if this current patch is accepted. So, the short answer to your question of whether it would cause problems to remove translate_debuginfo_syms is "no" -- it actually makes for simpler code for the non-ppc64 architectures. Thanks so much for taking the time to look at the patch. Please let me know if you have any further questions or if you're satisfied that I've addressed your concerns. > > Cheers > Nick
Hi Maynard, Ok, then, I am happy with the patch (modulo adding a comment). Please note of course that I cannot approve the patch, since I am not an oprofile maintainer, but from a binutils point of view I am happy with it. Cheers Nick
Tested this on my system and it resolved the reported problem.
Changing component since this is being proposed to be fixed in oprofile. Patches already included upstream: http://oprofile.git.sourceforge.net/git/gitweb.cgi?p=oprofile/oprofile;a=commitdiff;h=ab45a0cc5d1cf522c1aef8f22ed512a9aae0dc1c http://oprofile.git.sourceforge.net/git/gitweb.cgi?p=oprofile/oprofile;a=commitdiff;h=017a947cc8c77b398095789100ef831f8175cdf9
The problem with loading debuginfo symbols still exists in stock F17 oprofile-0.9.7-2.fc17. Applying the upstream fix (git commit ab45a0cc5d1cf522c1aef8f22ed512a9aae0dc1c applies cleanly) makes debuginfo symbols work again (at least on x86_64). Can anybody please push an update for oprofile including the fix? Or alternatively upgrading to oprofile 0.9.8 would solve the problem, too (in addition to bringing exciting new features like non-root profiling).
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.