Bug 545179 - RFE: add systemtap static probes to python runtime
Summary: RFE: add systemtap static probes to python runtime
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: python
Version: rawhide
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Dave Malcolm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 548865 563541
Blocks: StapStaticProbesF13 569695
TreeView+ depends on / blocked
 
Reported: 2009-12-07 19:53 UTC by Dave Malcolm
Modified: 2010-05-27 15:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 569695 (view as bug list)
Environment:
Last Closed: 2010-05-27 15:53:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch from wcohen's srpm (against 2.6.2), for reference (17.14 KB, patch)
2009-12-18 20:49 UTC, Dave Malcolm
no flags Details | Diff
For reference: patch I'm using against 2.6.4 (16.12 KB, patch)
2009-12-18 21:24 UTC, Dave Malcolm
no flags Details | Diff
Notes on benchmarking of python-2.6.4-dtrace.patch (11.19 KB, text/plain)
2010-02-10 20:15 UTC, Dave Malcolm
no flags Details
Minimal patch for method entry/exit only (5.96 KB, patch)
2010-02-10 21:04 UTC, Mark Wielaard
no flags Details | Diff
Notes on benchmarking of Python 2.6.4 patched with attachment 390110 (10.51 KB, text/plain)
2010-02-10 22:19 UTC, Dave Malcolm
no flags Details

Comment 1 Dave Malcolm 2009-12-09 16:08:58 UTC
mjw: what's the current status of this?

As I understand it, at our hackfest at FUDcon you were able to use one of the patches from the upstream tracker (at http://bugs.python.org/issue4111 ) and rebuild a tarball of Python-2.6.2, statically linked, against a local build of systemtap, and use that to trace function_entry and function_return.

So I think the remaining work here is:
  - ensure that systemtap in rawhide for F13 has the necessary fixes; http://gnu.wildebeest.org/diary/2009/12/07/fudcon-success-systemtap-meets-python/comment-page-1/#comment-94713 suggests that this is in upstream git: http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=2d971c6b8835c7e38ce78dd554827f5f96bc7c04   
     - ISSUE: Is this fix in rawhide's systemtap RPMs yet?  Will other fixes be necessary?
  - get it working against a dynamically-linked build of Python
     - ISSUE: what's the status of this?
  - profile it (e.g. using the Unladen Swallow benchmarks) to ensure that there isn't a serious performance impact.
  - enable it in our Python SRPMs for F13
  - notify upstream 
  - update the testing/doc info on the Feature page to highlight the things that are possible (e.g. the script above at http://gnu.wildebeest.org/diary/2009/12/07/fudcon-success-systemtap-meets-python/comment-page-1/#comment-94713 )
  - implement other tracepoints/tapsets as time permits (see the feature page for ideas e.g. GIL instrumentation)

Thanks!

Comment 2 William Cohen 2009-12-09 16:44:00 UTC
There is a python f12 src rpm at:

http://people.redhat.com/wcohen/fudcon2009/python-2.6.2-5.fc12.src.rpm

This needs a version of systemtap from the systemtap cvs to build that includes the dtrace script that understands the "-I" option. This is not yet in rawhide. I have put a suitable systemtap source rpm in that same directory as the python source rpm. Once the new version of systemtap is released in the near future, should be able to use the new official release.

http://people.redhat.com/wcohen/fudcon2009/systemtap-1.0-1.fc12.python.src.rpm

I was able to get probes working on the shared library with the locally built f12 stap patched python. Not sure why Mark's build had problems with it. There is a little example script in http://people.redhat.com/wcohen/fudcon2009/ for x86_64 machines. It will need to be adjusted for i686. Really should include a tapset in the patch to avoid those grudgy details.

I tried to apply the patches to the f13 python source rpm , but the f13 python source rpm does not currently build. David, could you check into that?

There are some areas where the patch has very machine specific code, e.g. "#ifdef (__i386)". Really need to check that the patched python is going to build on other machine.

Comment 3 Dave Malcolm 2009-12-18 20:49:30 UTC
Created attachment 379291 [details]
Patch from wcohen's srpm (against 2.6.2), for reference

Comment 4 Dave Malcolm 2009-12-18 21:01:13 UTC
(In reply to comment #2) 
> I tried to apply the patches to the f13 python source rpm , but the f13 python
> source rpm does not currently build. David, could you check into that?
This was probably due to bug 544275; fixed now.

I've regenerated the patch against 2.6.4 and am attempting a scratch build:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1880337
and it fails due to the dtrace -I path issue mentioned in comment #1:
http://koji.fedoraproject.org/koji/getfile?taskID=1880338&name=build.log

(from build log:)
dtrace -o Python/phelper.o -DPHELPER  -I. -IInclude -I./Include -I/usr/lib64/libffi-3.0.5/include   -C -G -s ./Include/phelper.d
gcc -pthread -c -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC    -I. -IInclude -I./Include -I/usr/lib64/libffi-3.0.5/include   -fPIC -DPy_BUILD_CORE -o Python/thread.o Python/thread.c
In file included from ./Include/phelper.d:20:
./Include/pyport.h:4:48: error: pyconfig.h: No such file or directory
./Include/pyport.h:667:2: error: #error "could not set LONG_MAX in pyport.h"
"cpp s_filename" failed
Usage /usr/bin/dtrace [--help] [-h | -G] -s File.d [-o File]
make: *** [Python/phelper.o] Error 1

So I can't enable this in rawhide until the systemtap rpm in rawhide has the fix.

Comment 5 Dave Malcolm 2009-12-18 21:24:17 UTC
Created attachment 379300 [details]
For reference: patch I'm using against 2.6.4

Comment 6 Dave Malcolm 2010-02-09 23:22:19 UTC
From the feature page status:
DONE:
Built Python 2 with Systemtap patch adding function__entry and function__return static markers: http://koji.fedoraproject.org/koji/buildinfo?buildID=155204 (tagged as python-2_6_4-17_fc13)
Python 2 tapset written, hiding some of the complexity of the static markers and providing python.function.entry and python.function.return; example of usage added to docs in python-libs subpackage, logging all Python function calls/return across whole system. Built as http://koji.fedoraproject.org/koji/buildinfo?buildID=155238 (tagged as python-2_6_4-18_fc13); example script tested on i386 only so far.

TODO:
Test performance
Double-check generated machine code
Test with and without probes, on both architectures
More documentation
Additional probepoints as listed under "Scope" below
All of the above for Python 3

(Help with the python3 port would be useful!)

Comment 7 Dave Malcolm 2010-02-09 23:31:43 UTC
Putting a breakpoint in gdb on PyCode_Addr2Line, I see that the stap static probe is adding an additional call to PyCode_Addr2Line per Python function call, and this happens whether or not the probe-point is enabled; this function is non-trivial, and might be noticable in profiles.  Is there a way to rework the systemtap probe so this call is only made if the probepoint is in use?

Comment 8 Frank Ch. Eigler 2010-02-09 23:54:55 UTC
It should be possible not to call that if a probing session is not active.

High-overhead prep code for calling sdt.h macros should be protected by
a predicate such as this blurb:

+        if (PYTHON_FUNCTION_ENTRY_ENABLED())
+                dtrace_entry(f);

It seems as though a similar predicate call would be useful in a few other
places where the addr2line function is called under #ifdef WITH_DTRACE.

Comment 9 Mark Wielaard 2010-02-10 09:14:27 UTC
(In reply to comment #7)
> Putting a breakpoint in gdb on PyCode_Addr2Line, I see that the stap static
> probe is adding an additional call to PyCode_Addr2Line per Python function
> call, and this happens whether or not the probe-point is enabled; this function
> is non-trivial, and might be noticable in profiles.  Is there a way to rework
> the systemtap probe so this call is only made if the probepoint is in use?    

There are two places this might happen. In the dtrace_entry() and dtrace_return() functions and in the call site code in the PyEval_EvalFrame() loop.

The first two dtrace_entry() and dtrace_return() should already be guarded by PYTHON_FUNCTION_ENTRY_ENABLED() and PYTHON_FUNCTION_RETURN_ENABLED(), so shouldn't be called when systemtap probes are disabled. Could you check to see if that part is working correctly?

The second usage in the PyEval_EvalFrame() loop is trickier. The code wants to store the call site line in the PyFrameObject f_calllineno field. This is for enabling python code backtracing based on crawling the stack, recovering the PyFrameObject and grabbing the f_calllineno field to show where we were called from. These cannot easily be guarded by the ENABLED macros since we might need them later when probing is turned on and someone calls a (not yet implemented) py_stack() stap function.

If the first usage is impacting performance noticeably when probes aren't enabled ways then that is clearly a bug in systemtap, the ENABLED macros should make this usage completely unnoticeable.

If the second usage is impacting performance noticeably then they can just be removed for now (then also remove the definition of f_calllineno in PyFrameObject) and we will have to figure out some other way to retrieve the line numbers on a backtrace. We might just have to implement the co_lnotab extraction from PyCode_Addr2Line in a tapset function.

How do you handle backtraces in the python gdb macros?

Comment 10 Dave Malcolm 2010-02-10 17:09:58 UTC
I have some concerns with the python systemtap patch as-is:
- worries about performance.  I hope to run a benchmark suite today with/without the patch, and I'll add notes to bz on how to do this so that people can run the benchmarks themselves
- there's some dubious-looking arch-specific code in the patch which makes me worry about the need for arch-specific testing, and breaks the build on e.g. ppc/ppc64 (this is bug 563541)
- the patch could do more, adding more markers and tapset wrappers in less performance-critical sections e.g. "python.exception.raised", say, "python.warning" etc (see the ideas in the scope on https://fedoraproject.org/wiki/Features/SystemtapStaticProbes#Python)
- patch is not yet upstream


How much of the DTrace patch is actually needed for Systemtap?

The current patch is here:
http://cvs.fedoraproject.org/viewvc/rpms/python/devel/python-2.6.4-dtrace.patch?revision=1.1&view=markup

Going through it hunk by hunk:
  - configure.in: changes look reasonable, but they don't include the "AC_ARG_WITH([tapset-install-dir]," stuff suggested on http://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps - I wonder if they should?  Currently python.spec treats the tapset as a separate source and installs them during %install, rather than as part of the Makefile (with some logic to cover lib vs lib64 and to ensure the correct dso numbering in the path; see: http://cvs.fedoraproject.org/viewvc/rpms/python/devel/python.spec?r1=1.169&r2=1.170 ).  Are tapsets a systemtap-specific concept not found in dtrace? (if so, then perhaps it makes more sense to try to merge this upstream as a systemtap patch, including the tapsets).
  - Include/frameobject.h: adds a f_calllineno field to struct PyFrameObject; this change gives me concerns about performance (see below)
  - Include/phelper.d: (new file): seems to exist in order to extract user-space data from the Python process.  Is this DTrace-specific?  Am I right in thinking this stuff is done for us by the DWARF-handling hooks in Systemtap?
  - Include/pydtrace.d: (new file): am I right in thinking that Systemtap's implementation of /usr/bin/dtrace uses this?
  - Makefile.pre.in: adds building the headers pydtrace.h and phelper.h, and building phelper.o and dtrace.o; again, how much of this do we use with a systemtap build? 
  - Objects/frameobject.c: sets f->f_calllineno to top of function; shouldn't be too much of a perf impact
  - pyconfig.h.in: adds #undef WITH_DTRACE, but also adds a bunch of other things that don't appear to be related at first glance
  - Python/ceval.c:
    - Note that this file is the central interpreter loop, and small changes here can have a big impact on Python's performance.
    - Patch adds dtrace_entry and dtrace_return hooks (with what apprears to be a __sparc-specific workaround)
    - Patch does "shenanigans look like utter madness...to make sure that the ustack helper will see the PyFrameObject pointer on the stack" which:
      - on some architectures adds a "PyEval_EvalFrameExReal" separate from "PyEval_EvalFrameEx"
      - wires up the calls to dtrace_entry/return
      - adds calls to PyCode_Addr2Line() to set the newly-added f->f_calllineno field (see the change to Include/frameobject.h).  As mentioned in comment #7, I'm concerned about these: this is a non-trivial function, and I expect it to have an impact on performance.

The new "f->f_calllineno" field appears to only get used by "this->lineno = this->frameo->f_calllineno;" within Include/phelper.d, the "Python ustack helper".  If this is DTrace only, can we do a Systemtap-specific patch which eliminates this new field?

Comment 11 Mark Wielaard 2010-02-10 19:38:04 UTC
(In reply to comment #10)
> I have some concerns with the python systemtap patch as-is:
> - worries about performance.  I hope to run a benchmark suite today
> with/without the patch, and I'll add notes to bz on how to do this so that
> people can run the benchmarks themselves

Thanks

> - there's some dubious-looking arch-specific code in the patch which makes me
> worry about the need for arch-specific testing, and breaks the build on e.g.
> ppc/ppc64 (this is bug 563541)

I commented on that bug, that is indeed unnecessary for systemtap usage.

> - the patch could do more, adding more markers and tapset wrappers in less
> performance-critical sections e.g. "python.exception.raised", say,
> "python.warning" etc (see the ideas in the scope on
> https://fedoraproject.org/wiki/Features/SystemtapStaticProbes#Python)

Yes, but maybe keep that for later once we are 

> - patch is not yet upstream

Yeah, and I suggested to keep the patch as close to what was suggested upstream as possible, but that might have been a mistake since it hasn't actually been accepted as is, and it does indeed give us more trouble than it might be worth. I had hoped some apple or sun hackers would have commented on the upstream bug/discussion, but that hasn't happened :{

> How much of the DTrace patch is actually needed for Systemtap?
> 
> The current patch is here:
> http://cvs.fedoraproject.org/viewvc/rpms/python/devel/python-2.6.4-dtrace.patch?revision=1.1&view=markup
> 
> Going through it hunk by hunk:
>   - configure.in: changes look reasonable, but they don't include the
> "AC_ARG_WITH([tapset-install-dir]," stuff suggested on
> http://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps - I wonder if
> they should?  Currently python.spec treats the tapset as a separate source and
> installs them during %install, rather than as part of the Makefile (with some
> logic to cover lib vs lib64 and to ensure the correct dso numbering in the
> path; see:
> http://cvs.fedoraproject.org/viewvc/rpms/python/devel/python.spec?r1=1.169&r2=1.170
> ).  Are tapsets a systemtap-specific concept not found in dtrace? (if so, then
> perhaps it makes more sense to try to merge this upstream as a systemtap patch,
> including the tapsets).

tapsets are systemtap specific yeah.

>   - Include/frameobject.h: adds a f_calllineno field to struct PyFrameObject;
> this change gives me concerns about performance (see below)

Yes, it is only used for backtracing, lets rip it out for now.

>   - Include/phelper.d: (new file): seems to exist in order to extract
> user-space data from the Python process.  Is this DTrace-specific?  Am I right
> in thinking this stuff is done for us by the DWARF-handling hooks in Systemtap?

Yes, this is dtrace specific (there are two kinds of .d files, the probe definition files, which systemtap can also handle, and the dtrace scripts which are like tapsets but dtrace specific - systemtap can also do some of what they do through dwarf crawling indeed).

>   - Include/pydtrace.d: (new file): am I right in thinking that Systemtap's
> implementation of /usr/bin/dtrace uses this?

Yes, that is the probe definition file. The dtrace script creates a .h file from that which defines the macros used in the .c files.

>   - Makefile.pre.in: adds building the headers pydtrace.h and phelper.h, and
> building phelper.o and dtrace.o; again, how much of this do we use with a
> systemtap build? 

We don't use the phelper stuff.

>   - Objects/frameobject.c: sets f->f_calllineno to top of function; shouldn't
> be too much of a perf impact

But we can drop it (see below).

>   - pyconfig.h.in: adds #undef WITH_DTRACE, but also adds a bunch of other
> things that don't appear to be related at first glance

yeah, those look odd, might this really be a generated file?

>   - Python/ceval.c:
>     - Note that this file is the central interpreter loop, and small changes
> here can have a big impact on Python's performance.
>     - Patch adds dtrace_entry and dtrace_return hooks (with what apprears to be
> a __sparc-specific workaround)
>     - Patch does "shenanigans look like utter madness...to make sure that the
> ustack helper will see the PyFrameObject pointer on the stack" which:
>       - on some architectures adds a "PyEval_EvalFrameExReal" separate from
> "PyEval_EvalFrameEx"

Yes, that workaround is because of dtrace sparc seeming to have problems if these functions get tail-called or something. We can remove it for systemtap.

>       - wires up the calls to dtrace_entry/return

This is the only important part for us at this time.

>       - adds calls to PyCode_Addr2Line() to set the newly-added f->f_calllineno
> field (see the change to Include/frameobject.h).  As mentioned in comment #7,
> I'm concerned about these: this is a non-trivial function, and I expect it to
> have an impact on performance.
> 
> The new "f->f_calllineno" field appears to only get used by "this->lineno =
> this->frameo->f_calllineno;" within Include/phelper.d, the "Python ustack
> helper".  If this is DTrace only, can we do a Systemtap-specific patch which
> eliminates this new field?    

Yes, I'll suggest a minimal patch for systemtap usage, that removes this part (which is indeed only for the dtrace ustack support). We might need something similar when we add python backtrace support from systemtap, but we might be able to do it from a tapset (see comment #9)

Comment 12 Dave Malcolm 2010-02-10 20:15:14 UTC
Created attachment 390102 [details]
Notes on benchmarking of python-2.6.4-dtrace.patch

Unfortunately the patch as-is does indeed have a marked negative effect on Python performance

Attached are my notes on profiling a build of the 2.6.4 tarball with and without the patch on my F-12 machine, using Unladen Swallow's benchmark suite:

The results show 5-14% additional time being taken on most benchmarks (nbody turned out faster, but not statistically significant; could have been a fluke, machine _was_ doing some other stuff)

The probe points weren't in use; I believe we're seeing the cost of setting up that f_calllineno in every function call.

Caveat: machine is my laptop and was doing some other tasks (e.g. gnome session with xchat/IRC); however given that most of the numbers show at least 4% regression in performance, I think this is enough to prove the patch needs work before we can turn this on).

Comment 13 Mark Wielaard 2010-02-10 21:04:09 UTC
Created attachment 390110 [details]
Minimal patch for method entry/exit only

Untested minimal patch that just provides method entry/exit probes. Should still work against apple and sun's dtrace (I kept the anti-tailcall sparc in since it seemed harmless).

Comment 14 Dave Malcolm 2010-02-10 22:19:50 UTC
Created attachment 390126 [details]
Notes on benchmarking of Python 2.6.4 patched with attachment 390110 [details]

Attached are notes on a quick run of the unladen-swallow bmarks on my laptop, comparing python tarball 2.6.4 vs 2.6.4 with the patch from attachment 390110 [details] i.e. with the static markers compiled in, but not actually enabled.

Results are much more promising than the original version of patch; most benchmarks show no significant difference in time taken after adding the patch, of the two reporting a difference, django was slightly faster, slowpickle was slightly slower, but this was on a laptop with other things running.

Comment 15 Dave Malcolm 2010-02-10 22:31:22 UTC
I verified that the static markers were actually functioning in my builds (they are) like this (it's a statically-linked build of python as "./python", no libpython here):

[david@brick Python-2.6.4-with-attachment-390110]$ stap -vv -e'probe python.function.entry = process("python").library("./python").mark("function__entry")
{
    filename = user_string($arg1);
    funcname = user_string($arg2);
    lineno = $arg3;
}
probe python.function.entry {printf("%s:%s:%i\n", filename, funcname, lineno);}' -c ./python

Doing so starts up the interactive interpreter, and the probe emits hundreds of messages from all of the python code called at startup (site.py).

I'll rebuild the f-13 python with the new patch

Comment 16 Dave Malcolm 2010-02-10 23:31:10 UTC
(In reply to comment #15)
> I'll rebuild the f-13 python with the new patch    

I've committed Mark's attachment 390110 [details] to dist-cvs; tagged as: python-2_6_4-19_fc13 and built into dist-f13 : http://koji.fedoraproject.org/koji/buildinfo?buildID=155535

Tested on a rawhide box and the tapset/example probe still works:
# rpm -q python python-libs python-debuginfo
python-2.6.4-19.fc13.i686
python-libs-2.6.4-19.fc13.i686
python-debuginfo-2.6.4-19.fc13.i686
# stap -v /usr/share/doc/python-libs-2.6.4/systemtap-example.stp -c yum help | head
Pass 1: parsed user script and 66 library script(s) using 20224virt/12248res/2040shr kb, in 150usr/10sys/160real ms.
Pass 2: analyzed script: 2 probe(s), 14 function(s), 2 embed(s), 2 global(s) using 25184virt/14572res/3380shr kb, in 20usr/0sys/14real ms.
Pass 3: using cached /root/.systemtap/cache/5a/stap_5a80297603ac4434b77b22e6f4127f00_5903.c
Pass 4: using cached /root/.systemtap/cache/5a/stap_5a80297603ac4434b77b22e6f4127f00_5903.ko
Pass 5: starting run.
     0 yum(23287): => <module> in /usr/lib/python2.6/site.py:59
   439 yum(23287):  => <module> in /usr/lib/python2.6/os.py:22
  1021 yum(23287):   => <module> in /usr/lib/python2.6/posixpath.py:11
  1146 yum(23287):    => <module> in /usr/lib/python2.6/stat.py:4
  1163 yum(23287):    <= <module> in /usr/lib/python2.6/stat.py:94
  1272 yum(23287):    => <module> in /usr/lib/python2.6/genericpath.py:5
  1292 yum(23287):    <= <module> in /usr/lib/python2.6/genericpath.py:85
  1483 yum(23287):    => <module> in /usr/lib/python2.6/warnings.py:1
  1677 yum(23287):     => <module> in /usr/lib/python2.6/linecache.py:6
  1698 yum(23287):     <= <module> in /usr/lib/python2.6/linecache.py:68
ERROR: Couldn't write to output 1 for cpu 0, exiting.: Success
Pass 5: run completed in 40usr/90sys/146real ms.

(error above is due to the use of "head" in the shell pipeline)

Comment 17 Dave Malcolm 2010-02-12 22:08:58 UTC
I've ported the patch to Python 3:
http://cvs.fedoraproject.org/viewvc/rpms/python3/devel/python-3.1.1-systemtap.patch?revision=1.1&view=markup

and built with it as python3-3.1.1-25.fc13 ; I've tested it briefly and update the Feature Page accordingly

Comment 18 Dave Malcolm 2010-05-27 15:53:53 UTC
Not sure why this is still open; this is done, for both python and python3 for Fedora 13.  Potentially we could add more static probepoints, but that seems like a future feature (and a separate bug).

Closing.


Note You need to log in before you can comment on or make changes to this bug.