Bug 1397113

Summary: objdump high CPU Usage in libbfd:comp_unit_contains_address.part.13
Product: [Fedora] Fedora Reporter: Parag Warudkar <parag.lkml>
Component: binutilsAssignee: Nick Clifton <nickc>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: jakub, nickc, parag.lkml
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-12 18:30:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Parag Warudkar 2016-11-21 16:17:39 UTC
Description of problem:
Trying to profile firefox CPU usage using perf record/report. Debuginfo for firefox and all the dependencies it comes with is installed. After recording for a short time (30-45s) and using perf report, if I navigate to Annotate the symbol with highest % overhead (doesn't matter but it happens to be in libxul for this case) the perf UI will freeze for a really long time. 

top tells me objdump is using 100% CPU for several minutes. Profiling that reveals the below annotated perf output for libbfd:comp_unit_contains_address.part.13 which is listed in perf report as having >95% overhead.

      │      arange = &unit->arange;
       │      add    $0x18,%rdi
       │      nop
       │      do
       │        {
       │          if (addr >= arange->low && addr < arange->high)
 96.18 │      cmp    %rsi,0x8(%rdi)
  0.01 │    ↓ ja     b34
  2.16 │      cmp    0x10(%rdi),%rsi
       │    ↓ jb     b40
       │            return TRUE;
       │          arange = arange->next;
  1.37 │      mov    (%rdi),%rdi

Version-Release number of selected component (if applicable):
libbfd-2.26.1-1.fc25

How reproducible:
Always

Steps to Reproduce:
1. Install perf, firefox-debuginfo, run firefox with 2-3 tabs loaded 
2. Profile using perf record -p <FF_PID> for 30s or more. Run the report with perf report. 
3. Click on the highest overhead listing and then click Annotate 

Actual results:
Perf UI freezes
objdump run as root then takes 100% CPU for a real long time

Expected results:
1) objdump should be faster or 2) at least the results should be cached so going back and forth in perf ui doesn't make it freeze every time.

Additional info:

Comment 1 Nick Clifton 2016-11-21 16:56:45 UTC
Hi Parag,

  Have you tried the latest development binutils sources ?  A patch was recently contributed to significantly improve objdump's performance and this might be what you are looking for:

  https://sourceware.org/ml/binutils/2016-11/msg00050.html

Cheers
  Nick

Comment 2 Parag Warudkar 2016-11-21 18:27:28 UTC
Hi Nick - thanks for the pointer, it looks promising. I need to find out if rawhide version of binutils has that patch integrated - not sure I want to rebuild from patched sources right now.

Comment 3 Nick Clifton 2016-11-22 17:25:21 UTC
(In reply to Parag Warudkar from comment #2)

> I need to find out if rawhide version of binutils has that patch integrated

It doesn't, yet ...  I will ping you when it is in.

Cheers
  Nick

Comment 4 Nick Clifton 2016-11-22 17:50:05 UTC
The patch is now in.  Please give binutils-2.27-12.fc26 a try...

Comment 5 Parag Warudkar 2016-11-27 00:01:41 UTC
No change with binutils-2.27-12.fc26 -

objdump -v
GNU objdump version 2.27-12.fc26

      │      arange = &unit->arange;
       │      add    $0x18,%rdi
       │      nop
       │      do
       │        {
       │          if (addr >= arange->low && addr < arange->high)
 96.15 │      cmp    %rsi,0x8(%rdi)
  0.01 │    ↓ ja     b64
  2.20 │      cmp    0x10(%rdi),%rsi
  0.00 │    ↓ jb     b70
       │            return TRUE;
       │          arange = arange->next;
  1.34 │      mov    (%rdi),%rdi
       │        }

So looks like yet another big linked list traversal that needs to be looked at?

Comment 6 Nick Clifton 2017-01-09 16:16:58 UTC
Hi Parag,

  Please could you try out: binutils-2.27-13.fc26 

  I have added a second patch which should reduce the amount of time objdump spends parsing and dumping the firefox binary.  In my local tests the time went from 5-mins,56-sec to 0-mins,33-sec, which is better but still not great.   Getting it any faster would mean some serious reworking of the internals of the BFD library however, which is something that I am loathe to do unless really necessary.

Cheers
  Nick

Comment 7 Parag Warudkar 2017-01-11 00:43:36 UTC
(In reply to Nick Clifton from comment #6)
> Hi Parag,
> 
>   Please could you try out: binutils-2.27-13.fc26 
> 
>   I have added a second patch which should reduce the amount of time objdump
> spends parsing and dumping the firefox binary.  In my local tests the time
> went from 5-mins,56-sec to 0-mins,33-sec, which is better but still not
> great.   Getting it any faster would mean some serious reworking of the
> internals of the BFD library however, which is something that I am loathe to
> do unless really necessary.
> 
> Cheers
>   Nick

Yep, 2.27-13.fc26 does make the same test case run an order of magnitude faster. There's still a ~1 min UI freeze and objdump still eats ~20+GiB of RAM but I guess we can close this as acceptable improvement for now and I will follow up with a separate bug if there is place for further improvement? Let me know if you agree and I will close this bug.

Thanks for looking into this.

Comment 8 Nick Clifton 2017-01-11 15:14:43 UTC
Hi Parag,

(In reply to Parag Warudkar from comment #7)

> There's still a ~1 min UI freeze and objdump still eats ~20+GiB of
> RAM

I think that this is inevitable.  There is an awful lot of debug information
to be read in and converted into useful data structures, and this just takes
time and memory.  Of course if perf was not trying to annotate the disassembly
with source code information, things would go a lot faster...

> but I guess we can close this as acceptable improvement for now and I
> will follow up with a separate bug if there is place for further
> improvement? 

OK.

> Let me know if you agree and I will close this bug.

Yes, please do close it.

Cheers
  Nick