| Summary: | `stap -L '**'` too slow | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Alexander Kurtakov <akurtako> |
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
| Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 24 | CC: | akurtako, brolley, dsmith, fche, jistone, lberk, mbenitez, mjw, scox, wcohen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-08 16:37:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Alexander Kurtakov
2016-08-19 11:36:30 UTC
For 'stap -L **' to work well for IDE purposes, I'd think it would need to complete in less than 5 seconds. From 8 minutes to 5 seconds is quite a large speed increase. That is asking a lot, and even if it were possible it isn't going to happen soon. If you'd like to work on it, patches are welcome.
For 'stap -L **', we're going through *lots* of data. Could the execution time be improved? Sure. Will it ever be fast enough for IDE purposes? I'd doubt it.
Also note that 'stap -L **' doesn't really do what I'd guess you are trying to do - list every possible probe point in the system. That information just doesn't exist, and if it did it would take must longer than 8 minutes to generate. You've got to remember that systemtap can probe every possible executable and library on the system.
Instead, could I suggest the following 3 stap options:
--dump-probe-types
Dumps a list of supported probe types and exits. If --privi-
lege=stapusr is also specified, the list will be limited to
probe types available to unprivileged users.
--dump-probe-aliases
Dumps a list of all probe aliases found in library files and ex-
its.
--dump-functions
Dumps a list of all the public functions found in library files
and exits. Also includes their parameters and types. A function
of type 'unknown' indicates a function that does not return a
value. Note that not all function/parameter types may be re-
solved (these are also shown by 'unknown'). This features is
very memory-intensive and thus may not work properly with --use-
server if the target server imposes an rlimit on process memory
(i.e. through the ~stap-server/.systemtap/rc configuration file,
see stap-server(8)).
Perhaps we should back up a bit and you could explain what you are really trying to do and we'll try to figure out how to help you do it.
(In reply to David Smith from comment #1) > For 'stap -L **' to work well for IDE purposes, I'd think it would need to > complete in less than 5 seconds. From 8 minutes to 5 seconds is quite a > large speed increase. That is asking a lot, and even if it were possible it > isn't going to happen soon. Yeah. I don't think stap was always that slow though. And from a strace, it looks like stap is doing a ton of repetitive file system searching, as though elfutils etc. caching is not being performed. I'm not suggesting it's a simple problem, but whatever it is, it may well help normal stap operation too, not just '-L **'. (In reply to David Smith from comment #1) > If you'd like to work on it, patches are welcome. I hope this didn't come across as insulting. Right now I don't believe we have anyone available to work on speeding up 'stap -L **'. If you have the time, we would appreciate a patch to speed this up. I'd still like to hold a conversation with you about what you are looking for and how we could help you. It might be easier to hold that conversation through email (systemtap) or on irc ('#systemtap' on freenode), rather than here. But whatever mechanism you choose we're happy to try to help. (In reply to Frank Ch. Eigler from comment #2) > Yeah. I don't think stap was always that slow though. And from a strace, > it looks like stap is doing a ton of repetitive file system searching, as > though elfutils etc. caching is not being performed. Do you have a sample output that you suspect is because of failed elfutils/libdwfl caching? (In reply to David Smith from comment #3) > (In reply to David Smith from comment #1) > > If you'd like to work on it, patches are welcome. > > I hope this didn't come across as insulting. Right now I don't believe we > have anyone available to work on speeding up 'stap -L **'. If you have the > time, we would appreciate a patch to speed this up. > > I'd still like to hold a conversation with you about what you are looking > for and how we could help you. It might be easier to hold that conversation > through email (systemtap) or on irc ('#systemtap' on > freenode), rather than here. But whatever mechanism you choose we're happy > to try to help. Not insulting at all, I know how it works :) but it's highly unlikely to happen from my side in the short term. This used to be in the 30-40 secs before and as we populate the data in another thread and view this was good enough for us esp as we cached it so as soon as someone opens the view, has files opened and starts editing all the data was available already. We use stap -L ** to get output like "stap.system.spawn ret:long pid:long $arg1:long $arg2:long" which is quite useful when we are about to provide autocomplete and hints. None of the dump options you specified are providing such detailed info. We already make use --dump-probe-types and --dump-functions for some functionality but haven't found option to get as detailed info as from "-L **". One thing I can see is that stap -L '**' calls dwfl_linux_kernel_report_offline over and over again. Since that sets up a Dwfl for the kernel and all modules that would certainly slow things down.
There is a dwflpp cache in dwarf_builder (tapsets.cxx) that seems designed to prevent creating new Dwfls when the kernel (or a module) has already been seen.
dwflpp *get_kern_dw(systemtap_session& sess, const string& module)
{
if (kern_dw[module] == 0)
kern_dw[module] = new dwflpp(sess, module, true); // might throw
return kern_dw[module];
}
But the above seems to fail since I can see it being called over and over again with the module name "kernel".
The issue seen in comment #6 is because my kernel-debuginfo doesn't match. And there doesn't seem to be any negative caching. So when dwflpp::setup_kernel throws a SEMANTIC_ERROR (which apparently is swallowed silently somewhere) because (offline_search_matches < 1) every time dwarf_builder.build() is called a new Dwfl for the kernel is created again (which does involve some directory scanning over and over again). (In reply to Alexander Kurtakov from comment #5) ... stuff deleted ... > This used to be in the 30-40 secs before and as we populate the data in > another thread and view this was good enough for us esp as we cached it so > as soon as someone opens the view, has files opened and starts editing all > the data was available already. Now that's new information. What version of systemtap ran in 30-40 seconds before? That was during the F20-21 timeframe where we were actively working on the plugin. It has been in maintenance for the last ~2 years and I figured this regression trying to check it's state last week. P.S. I can fully understand that this might be based on more probes available in various packages thus it is simply processing way more data now. This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |