Bug 503276 - Wildcarded syscall.* probe encurs a large startup time in pass 2, apparently defeating the cache
Summary: Wildcarded syscall.* probe encurs a large startup time in pass 2, apparently ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Josh Stone
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-30 16:03 UTC by Dave Malcolm
Modified: 2010-02-27 20:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-27 20:24:42 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Dave Malcolm 2009-05-30 16:03:54 UTC
Description of problem:
I'm trying the following systemtap script:

probe syscall.*
{
  printf ("call to %s(%s) from %s(%d)\n", name, argstr, execname(), pid())
}
probe syscall.*.return
{
  printf ("return from %s into %s(%d)\n", name, execname(), pid())
}

i.e. with two wildcarded probes on all syscalls.

There is a long delay when I start the script, every time I start it, not just the first time.

Running with -v -p4 , I see that pass 1 completes very quickly, as do passes 3 and 4, using cached results,  but pass 2 seems to take over 10 seconds each time.

Pass 1: parsed user script and 49 library script(s) in 360usr/10sys/385real ms.
Pass 2: analyzed script: 630 probe(s), 102 function(s), 14 embed(s), 1 global(s) in 12570usr/410sys/13318real ms.
/root/.systemtap/cache/83/stap_83b9d8fdd3f2b82db13e9f298d732c13_327705.ko
Pass 3: using cached /root/.systemtap/cache/83/stap_83b9d8fdd3f2b82db13e9f298d732c13_327705.c
Pass 4: using cached /root/.systemtap/cache/83/stap_83b9d8fdd3f2b82db13e9f298d732c13_327705.ko


Version-Release number of selected component (if applicable):
systemtap-0.9.5-1.fc10.i386
systemtap-runtime-0.9.5-1.fc10.i386
kernel-2.6.27.12-170.2.5.fc10.i686
kernel-devel-2.6.27.12-170.2.5.fc10.i686
kernel-debuginfo-2.6.27.12-170.2.5.fc10.i686
kernel-headers-2.6.27.12-170.2.5.fc10.i386
kernel-debuginfo-common-2.6.27.12-170.2.5.fc10.i686

How reproducible:
100% with the given script, appears to affect other kinds of wildcarding.

Steps to Reproduce:
1. "stap -v -p 4 syscalls.stp" with syscalls.stp as the script above
  
Actual results:
10-15 second delay starting the script each time, occurring in pass 2.

Expected results:
Repeated invocations of the script should hit caches, and take under a second to start.

Additional info:
I'm trying to write a UI for systemtap (just a hobby/sideproject at this stage), and this 10-15 second startup each time is a nuisance.

Comment 1 Frank Ch. Eigler 2009-06-02 01:42:22 UTC
The pass-2 time is unfortunately needed to identify the "cache line"
in the compiled module cache, so it doesn't "defeat" the cache -- it's
a part of using it.  But clearly this pass needs to be optimized further.

Comment 2 Dave Malcolm 2009-06-03 17:17:50 UTC
Thanks.  

(I was being sloppy in my use of the word "cache"; taking an educated guess it looks like there's a cache at pass 3, caching from parsed stp scripts on their md5sum and another value to generated .c files, and another one at pass 4, caching from .c files on their md5sums and another value to .ko files.  Can some kind of caching be added to pass 2? e.g. look at timestamps of the relevant input files, or perhaps the tapset directory, and if nothing's changed, reuse the processed stp script with tapsets already resolved? for a shortcut through on repeated invocations.  I may be misunderstanding the innards of stap though...)

Comment 3 Bug Zapper 2009-06-09 16:50:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Frank Ch. Eigler 2009-06-17 21:07:42 UTC
Since Josh is improving this area, reassigning to him.

Comment 5 Josh Stone 2009-06-17 21:24:46 UTC
Resolving and hashing all of the inputs to pass-2 is not all that trivial -- it requires all input scripts, tapsets, kernel binaries w/ debuginfo, and user binaries w/ debuginfo.  I think simply optimizing this path is the better route.

For the systemtap 0.9.8 release we made some improvements in this area.  Your example script on 0.9.5 takes me 8010usr/430sys/8451real ms in pass-2, and on 0.9.8 that's down to 3520usr/240sys/3761real ms.  I hope that is a noticeable improvement in your scenario, but we'll continue to look for more fat to trim.

Comment 6 Frank Ch. Eigler 2009-07-29 14:00:40 UTC
Dave, would you be able to try building git systemtap to try this?
A recent change went in (and will go into the next release, whenever)
that should dramatically speed this up.

Comment 7 Dave Malcolm 2009-07-30 08:59:34 UTC
Thanks.  I'm seeing similar improvements to the ones reported in comment 5, reducing the "warm cache" case from about 13 seconds to about 5 seconds; real time for pass 2 has reduced from about 13.3 seconds to about 4.8 seconds.

Details:

Old version:
[david@brick stap]$ rpm -q systemtap
systemtap-0.9.5-1.fc10.i386
[david@brick stap]$ time stap -v -p 4 syscalls.stp
Pass 1: parsed user script and 49 library script(s) in 380usr/10sys/418real ms.
Pass 2: analyzed script: 630 probe(s), 102 function(s), 14 embed(s), 1 global(s) in 12750usr/430sys/13307real ms.
/home/david/.systemtap/cache/db/stap_db2b4881ad1f4745e5d7ed03183bd403_327705.ko
Pass 3: using cached /home/david/.systemtap/cache/db/stap_db2b4881ad1f4745e5d7ed03183bd403_327705.c
Pass 4: using cached /home/david/.systemtap/cache/db/stap_db2b4881ad1f4745e5d7ed03183bd403_327705.ko

real	0m13.860s
user	0m13.163s
sys	0m0.519s

Local build from git (83ca3872dadc89c5a8155150bf76d5c09890dcc9)
"cold cache" case:
[david@brick stap]$ time ../systemtap-0.9.8-4935/bin/stap -v -p 4 syscalls.stp
Pass 1: parsed user script and 58 library script(s) in 350usr/30sys/3642real ms.
Pass 2: analyzed script: 630 probe(s), 102 function(s), 15 embed(s), 1 global(s) in 4720usr/50sys/4896real ms.
Pass 3: translated to C into "/tmp/stapfAW31w/stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.c" in 780usr/50sys/1000real ms.
/home/david/.systemtap/cache/5f/stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.ko
Pass 4: compiled C into "stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.ko" in 16410usr/500sys/20268real ms.

real	0m31.242s
user	0m22.302s
sys	0m0.685s

"warm cache" case:
[david@brick stap]$ time ../systemtap-0.9.8-4935/bin/stap -v -p 4 syscalls.stp
Pass 1: parsed user script and 58 library script(s) in 220usr/10sys/245real ms.
Pass 2: analyzed script: 630 probe(s), 102 function(s), 15 embed(s), 1 global(s) in 4690usr/60sys/4801real ms.
/home/david/.systemtap/cache/5f/stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.ko
Pass 3: using cached /home/david/.systemtap/cache/5f/stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.c
Pass 4: using cached /home/david/.systemtap/cache/5f/stap_5fe74f3ef6bbe22ae79f5553c3600695_328134.ko

real	0m5.136s
user	0m4.946s
sys	0m0.093s

Comment 8 Josh Stone 2009-10-07 23:16:41 UTC
With stap 1.0, I'm now seeing pass-2 at 1340usr/50sys/1392real ms.

Comment 9 Frank Ch. Eigler 2010-02-27 16:19:23 UTC
I'm seeing very short times now in pass2 also, and proper caching in pass 3/4.
David, are you still seeing anomalous times?

Comment 10 Dave Malcolm 2010-02-27 17:18:39 UTC
Pass 2 is now taking 2-2.5 real seconds for the "hot" case, so this has improved; thanks!

[david@brick stap]$ stap -v -p4 syscalls.stp 
Pass 1: parsed user script and 65 library script(s) using 20136virt/12264res/2056shr kb, in 220usr/10sys/232real ms.
Pass 2: analyzed script: 630 probe(s), 110 function(s), 16 embed(s), 1 global(s) using 180820virt/114452res/73688shr kb, in 2130usr/200sys/2559real ms.
/home/david/.systemtap/cache/93/stap_937796f3b632292c5a1c856192c48729_342846.ko
Pass 3: using cached /home/david/.systemtap/cache/93/stap_937796f3b632292c5a1c856192c48729_342846.c
Pass 4: using cached /home/david/.systemtap/cache/93/stap_937796f3b632292c5a1c856192c48729_342846.ko

[david@brick stap]$ rpm -qa systemtap\* kernel\* | sort
kernel-debuginfo-common-i686-2.6.31.12-174.2.3.fc12.i686
kernel-firmware-2.6.31.12-174.2.3.fc12.noarch
kernel-headers-2.6.31.12-174.2.3.fc12.i686
kernel-PAE-2.6.31.12-174.2.3.fc12.i686
kernel-PAE-2.6.31.6-145.fc12.i686
kernel-PAE-2.6.31.9-174.fc12.i686
kernel-PAE-debuginfo-2.6.31.12-174.2.3.fc12.i686
kernel-PAE-devel-2.6.31.12-174.2.3.fc12.i686
kernel-PAE-devel-2.6.31.6-145.fc12.i686
kernel-PAE-devel-2.6.31.9-174.fc12.i686
systemtap-1.1-2.fc12.i686
systemtap-runtime-1.1-2.fc12.i686
systemtap-sdt-devel-1.1-1.fc12.i686

Comment 11 Frank Ch. Eigler 2010-02-27 20:24:42 UTC
thanks for confirming!


Note You need to log in before you can comment on or make changes to this bug.