Bug 1368433 - `stap -L '**'` too slow
Summary: `stap -L '**'` too slow
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Frank Ch. Eigler
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 11:36 UTC by Alexander Kurtakov
Modified: 2017-08-08 16:37 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-08 16:37:32 UTC
Type: Bug


Attachments (Terms of Use)

Description Alexander Kurtakov 2016-08-19 11:36:30 UTC
Running `stap -L '**'` takes 8 minutes on fully uptodate Fedora 24 machine. Machine has many libraries and devel packages installed but still this is way too much time spend as it renders listing all probes (for IDE purposes) impossible.

Comment 1 David Smith 2016-08-19 18:04:50 UTC
For 'stap -L **' to work well for IDE purposes, I'd think it would need to complete in less than 5 seconds. From 8 minutes to 5 seconds is quite a large speed increase. That is asking a lot, and even if it were possible it isn't going to happen soon. If you'd like to work on it, patches are welcome.

For 'stap -L **', we're going through *lots* of data. Could the execution time be improved? Sure. Will it ever be fast enough for IDE purposes? I'd doubt it.

Also note that 'stap -L **' doesn't really do what I'd guess you are trying to do - list every possible probe point in the system. That information just doesn't exist, and if it did it would take must longer than 8 minutes to generate. You've got to remember that systemtap can probe every possible executable and library on the system.

Instead, could I suggest the following 3 stap options:

       --dump-probe-types                                                       
              Dumps  a  list  of  supported probe types and exits. If --privi-  
              lege=stapusr is also specified, the  list  will  be  limited  to  
              probe types available to unprivileged users.                      
                                                                                
       --dump-probe-aliases                                                     
              Dumps a list of all probe aliases found in library files and ex-  
              its.                                                              
                                                                                
       --dump-functions                                                         
              Dumps a list of all the public functions found in library  files  
              and  exits. Also includes their parameters and types. A function  
              of type 'unknown' indicates a function that does  not  return  a  
              value.  Note  that  not  all function/parameter types may be re-  
              solved (these are also shown by  'unknown').  This  features  is  
              very memory-intensive and thus may not work properly with --use-  
              server if the target server imposes an rlimit on process  memory  
              (i.e. through the ~stap-server/.systemtap/rc configuration file,  
              see stap-server(8)).                                              

Perhaps we should back up a bit and you could explain what you are really trying to do and we'll try to figure out how to help you do it.

Comment 2 Frank Ch. Eigler 2016-08-23 17:44:11 UTC
(In reply to David Smith from comment #1)
> For 'stap -L **' to work well for IDE purposes, I'd think it would need to
> complete in less than 5 seconds. From 8 minutes to 5 seconds is quite a
> large speed increase. That is asking a lot, and even if it were possible it
> isn't going to happen soon.

Yeah.  I don't think stap was always that slow though.  And from a strace, it looks like stap is doing a ton of repetitive file system searching, as though elfutils etc. caching is not being performed.  I'm not suggesting it's a simple problem, but whatever it is, it may well help normal stap operation too, not just '-L **'.

Comment 3 David Smith 2016-08-23 17:54:26 UTC
(In reply to David Smith from comment #1)
> If you'd like to work on it, patches are welcome.

I hope this didn't come across as insulting. Right now I don't believe we have anyone available to work on speeding up 'stap -L **'. If you have the time, we would appreciate a patch to speed this up.

I'd still like to hold a conversation with you about what you are looking for and how we could help you. It might be easier to hold that conversation through email (systemtap) or on irc ('#systemtap' on freenode), rather than here. But whatever mechanism you choose we're happy to try to help.

Comment 4 Mark Wielaard 2016-08-23 18:00:00 UTC
(In reply to Frank Ch. Eigler from comment #2)
> Yeah.  I don't think stap was always that slow though.  And from a strace,
> it looks like stap is doing a ton of repetitive file system searching, as
> though elfutils etc. caching is not being performed.

Do you have a sample output that you suspect is because of failed elfutils/libdwfl caching?

Comment 5 Alexander Kurtakov 2016-08-23 19:23:30 UTC
(In reply to David Smith from comment #3)
> (In reply to David Smith from comment #1)
> > If you'd like to work on it, patches are welcome.
> 
> I hope this didn't come across as insulting. Right now I don't believe we
> have anyone available to work on speeding up 'stap -L **'. If you have the
> time, we would appreciate a patch to speed this up.
> 
> I'd still like to hold a conversation with you about what you are looking
> for and how we could help you. It might be easier to hold that conversation
> through email (systemtap) or on irc ('#systemtap' on
> freenode), rather than here. But whatever mechanism you choose we're happy
> to try to help.

Not insulting at all, I know how it works :) but it's highly unlikely to happen from my side in the short term.
This used to be in the 30-40 secs before and as we populate the data in another thread and view this was good enough for us esp as we cached it so as soon as someone opens the view, has files opened and starts editing all the data was available already.
We use stap -L ** to get output like "stap.system.spawn ret:long pid:long $arg1:long $arg2:long" which is quite useful when we are about to provide autocomplete and hints. None of the dump options you specified are providing such detailed info. We already make use --dump-probe-types and --dump-functions for some functionality but haven't found option to get as detailed info as from "-L **".

Comment 6 Mark Wielaard 2016-08-23 20:15:34 UTC
One thing I can see is that stap -L '**' calls dwfl_linux_kernel_report_offline over and over again. Since that sets up a Dwfl for the kernel and all modules that would certainly slow things down.

There is a dwflpp cache in dwarf_builder (tapsets.cxx) that seems designed to prevent creating new Dwfls when the kernel (or a module) has already been seen.

  dwflpp *get_kern_dw(systemtap_session& sess, const string& module)
  {
    if (kern_dw[module] == 0)
      kern_dw[module] = new dwflpp(sess, module, true); // might throw
    return kern_dw[module];
  }

But the above seems to fail since I can see it being called over and over again with the module name "kernel".

Comment 7 Mark Wielaard 2016-08-23 20:52:19 UTC
The issue seen in comment #6 is because my kernel-debuginfo doesn't match.
And there doesn't seem to be any negative caching. So when dwflpp::setup_kernel throws a SEMANTIC_ERROR (which apparently is swallowed silently somewhere) because (offline_search_matches < 1) every time dwarf_builder.build() is called a new Dwfl for the kernel is created again (which does involve some directory scanning over and over again).

Comment 8 David Smith 2016-08-24 12:58:07 UTC
(In reply to Alexander Kurtakov from comment #5)

... stuff deleted ...

> This used to be in the 30-40 secs before and as we populate the data in
> another thread and view this was good enough for us esp as we cached it so
> as soon as someone opens the view, has files opened and starts editing all
> the data was available already.

Now that's new information. What version of systemtap ran in 30-40 seconds before?

Comment 9 Alexander Kurtakov 2016-08-24 13:31:29 UTC
That was during the F20-21 timeframe where we were actively working on the plugin. It has been in maintenance for the last ~2 years and I figured this regression trying to check it's state last week.
P.S. I can fully understand that this might be based on more probes available in various packages thus it is simply processing way more data now.

Comment 10 Fedora End Of Life 2017-07-25 22:31:03 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2017-08-08 16:37:32 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.