Hide Forgot
Description of problem: Trying to profile firefox CPU usage using perf record/report. Debuginfo for firefox and all the dependencies it comes with is installed. After recording for a short time (30-45s) and using perf report, if I navigate to Annotate the symbol with highest % overhead (doesn't matter but it happens to be in libxul for this case) the perf UI will freeze for a really long time. top tells me objdump is using 100% CPU for several minutes. Profiling that reveals the below annotated perf output for libbfd:comp_unit_contains_address.part.13 which is listed in perf report as having >95% overhead. │ arange = &unit->arange; │ add $0x18,%rdi │ nop │ do │ { │ if (addr >= arange->low && addr < arange->high) 96.18 │ cmp %rsi,0x8(%rdi) 0.01 │ ↓ ja b34 2.16 │ cmp 0x10(%rdi),%rsi │ ↓ jb b40 │ return TRUE; │ arange = arange->next; 1.37 │ mov (%rdi),%rdi Version-Release number of selected component (if applicable): libbfd-2.26.1-1.fc25 How reproducible: Always Steps to Reproduce: 1. Install perf, firefox-debuginfo, run firefox with 2-3 tabs loaded 2. Profile using perf record -p <FF_PID> for 30s or more. Run the report with perf report. 3. Click on the highest overhead listing and then click Annotate Actual results: Perf UI freezes objdump run as root then takes 100% CPU for a real long time Expected results: 1) objdump should be faster or 2) at least the results should be cached so going back and forth in perf ui doesn't make it freeze every time. Additional info:
Hi Parag, Have you tried the latest development binutils sources ? A patch was recently contributed to significantly improve objdump's performance and this might be what you are looking for: https://sourceware.org/ml/binutils/2016-11/msg00050.html Cheers Nick
Hi Nick - thanks for the pointer, it looks promising. I need to find out if rawhide version of binutils has that patch integrated - not sure I want to rebuild from patched sources right now.
(In reply to Parag Warudkar from comment #2) > I need to find out if rawhide version of binutils has that patch integrated It doesn't, yet ... I will ping you when it is in. Cheers Nick
The patch is now in. Please give binutils-2.27-12.fc26 a try...
No change with binutils-2.27-12.fc26 - objdump -v GNU objdump version 2.27-12.fc26 │ arange = &unit->arange; │ add $0x18,%rdi │ nop │ do │ { │ if (addr >= arange->low && addr < arange->high) 96.15 │ cmp %rsi,0x8(%rdi) 0.01 │ ↓ ja b64 2.20 │ cmp 0x10(%rdi),%rsi 0.00 │ ↓ jb b70 │ return TRUE; │ arange = arange->next; 1.34 │ mov (%rdi),%rdi │ } So looks like yet another big linked list traversal that needs to be looked at?
Hi Parag, Please could you try out: binutils-2.27-13.fc26 I have added a second patch which should reduce the amount of time objdump spends parsing and dumping the firefox binary. In my local tests the time went from 5-mins,56-sec to 0-mins,33-sec, which is better but still not great. Getting it any faster would mean some serious reworking of the internals of the BFD library however, which is something that I am loathe to do unless really necessary. Cheers Nick
(In reply to Nick Clifton from comment #6) > Hi Parag, > > Please could you try out: binutils-2.27-13.fc26 > > I have added a second patch which should reduce the amount of time objdump > spends parsing and dumping the firefox binary. In my local tests the time > went from 5-mins,56-sec to 0-mins,33-sec, which is better but still not > great. Getting it any faster would mean some serious reworking of the > internals of the BFD library however, which is something that I am loathe to > do unless really necessary. > > Cheers > Nick Yep, 2.27-13.fc26 does make the same test case run an order of magnitude faster. There's still a ~1 min UI freeze and objdump still eats ~20+GiB of RAM but I guess we can close this as acceptable improvement for now and I will follow up with a separate bug if there is place for further improvement? Let me know if you agree and I will close this bug. Thanks for looking into this.
Hi Parag, (In reply to Parag Warudkar from comment #7) > There's still a ~1 min UI freeze and objdump still eats ~20+GiB of > RAM I think that this is inevitable. There is an awful lot of debug information to be read in and converted into useful data structures, and this just takes time and memory. Of course if perf was not trying to annotate the disassembly with source code information, things would go a lot faster... > but I guess we can close this as acceptable improvement for now and I > will follow up with a separate bug if there is place for further > improvement? OK. > Let me know if you agree and I will close this bug. Yes, please do close it. Cheers Nick