Bug 717136 - markers in DSO do not get called
Summary: markers in DSO do not get called
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: 15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Josh Stone
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-28 06:19 UTC by Daiki Ueno
Modified: 2011-08-10 03:19 UTC (History)
7 users (show)

Fixed In Version: systemtap-1.6-1.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-10 03:19:33 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
prelinked libraries, missing +uprobes (2.45 KB, text/plain)
2011-07-08 15:24 UTC, Frank Ch. Eigler
no flags Details
prelink -u'd, +uprobes working (2.72 KB, text/plain)
2011-07-08 15:24 UTC, Frank Ch. Eigler
no flags Details
Fix SDT relocations in prelinked modules (2.45 KB, patch)
2011-07-14 23:05 UTC, Josh Stone
no flags Details | Diff

Description Daiki Ueno 2011-06-28 06:19:38 UTC
Description of problem:
The systemtap probes defined in gobject.stp do not get called.

cf.
http://tecnocode.co.uk/2010/07/13/reference-count-debugging-with-systemtap/
http://blog.verbum.org/2011/03/19/analyzing-memory-use-with-systemtap/

Version-Release number of selected component (if applicable):
glib2-devel-2.28.8-1.fc15.x86_64
systemtap-1.4-9.fc15.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a file testgobject.c with the following content:
 #include <glib-object.h>

 int main (int argc, char **argv) {
   GObject *object;
   g_type_init ();
   object = g_object_new (G_TYPE_OBJECT, NULL);
   g_object_unref (object);
   return 0;
 }

2. gcc -O0 -g -o testgobject testgobject.c `pkg-config gobject-2.0 --cflags --libs`
3. stap --ldd -e 'probe gobject.object_new {printf("object_new\n");}' -c ./testgobject -d ./testgobject
  
Actual results:
No output

Expected results:
"object_new" is shown

Additional info:
/usr/share/systemtap/tapset/gobject.stp defines the probe as follows:

 probe gobject.object_new = process("/lib64/libgobject-2.0.so.0.2800.8").mark("object__new")
 {
 ...
 }

The marker is actually available:

 $ stap -L 'process("/lib64/libgobject-2.0.so.0.2800.8").mark("object__new")'
 process("/lib64/libgobject-2.0.so.0.2800.8").mark("object__new") $arg1:long $arg2:long

However, if I manually specify the marker like:

 $ stap -e 'probe process("/lib64/libgobject-2.0.so.0.2800.8").mark("object__new") {printf("object_new\n");}' -d ./testgobject -c ./testgobject

I get no result, while the function variant works:

 $ stap -e 'probe process("/lib64/libgobject-2.0.so.0.2800.8").function("g_object_new") {printf("object_new\n");}' -d ./testgobject -c ./testgobject 
 object_new

Maybe systemtap problem I guess.

Comment 1 Daiki Ueno 2011-06-30 01:54:38 UTC
I see a similar issue with libpython2.7-64.stp.
Reassigning to systemtap.

Comment 2 Josh Stone 2011-07-05 20:28:31 UTC
The given gobject test works for me.  However, in building uprobes.ko, there is a "warning: use of memory input without lvalue in asm operand 1 is deprecated".  This was previously seen to cause some unpredictable behavior, and was fixed in upstream commit 0746c987, included in version 1.5.  So that might explain your weird results.  Can you try systemtap 1.5 from updates-testing?

Comment 3 Frank Ch. Eigler 2011-07-08 15:23:20 UTC
I believe I'm seeing this problem, and it looks like another prelink-related
one. Sigh.  

In my case, on fedora15, git stap, testsuite/systemtap.exelib/pthreadprobes.exp
test case fails, with none of the probes hitting, even though libc/libpthreads
have the requisite markers.

Daiki, would you be able to try
% su -c "prelink -u /lib64/libgobject-2.0.so.0.2800.8"
and then rerun?

Comment 4 Frank Ch. Eigler 2011-07-08 15:24:24 UTC
Created attachment 511947 [details]
prelinked libraries, missing +uprobes

Comment 5 Frank Ch. Eigler 2011-07-08 15:24:49 UTC
Created attachment 511948 [details]
prelink -u'd, +uprobes working

Comment 6 Frank Ch. Eigler 2011-07-08 19:46:58 UTC
In the prelinked case, stap creates stap_probe instances of the form: 
   { .address=(unsigned long)0x303e208588ULL, .probe=(&stap_probes[9]),     
     .sdt_sem_offset=(unsigned long)0x303e200000ULL, },
which are obviously in need of some ofsetting down to a more proper range
like:
   { .address=(unsigned long)0x65dbULL, .probe=(&stap_probes[1]), },
(Note also that the former case falsely implicates semaphores.)

Comment 7 Daiki Ueno 2011-07-11 01:08:14 UTC
Sorry for the late response.

(In reply to comment #3)
> Daiki, would you be able to try
> % su -c "prelink -u /lib64/libgobject-2.0.so.0.2800.8"
> and then rerun?

Yes, this did the trick.

Comment 8 Josh Stone 2011-07-12 20:39:46 UTC
I wasn't able to reproduce this before, but I didn't have glib2-debuginfo.  With that installed, I can now reproduce the issue.  The .address looks sane without debuginfo, but is out of range with debuginfo (like it's absolute rather than relative).  The .sdt_sem_offset looks wrong both ways though.

In the past we've had problems with elfutils lining up prelinked binaries with their debuginfo.  However, since the addresses of function probes look correct here, and only sdt probes are wrong, it may well be an issue in stap's processing.

Comment 9 Josh Stone 2011-07-12 21:40:21 UTC
I believe the difference is this in sdt_query::handle_probe_entry():

  Dwarf_Addr bias;
  Elf* elf = (dwarf_getelf (dwfl_module_getdwarf (dw.mod_info->mod, &bias))
	      ?: dwfl_module_getelf (dw.mod_info->mod, &bias));
...
	  Dwarf_Addr reloc_addr = q.statement_num_val + bias;

The statement_num_val is the same either way, something like 0x38bbc0f5e6.  Without debuginfo (thus the latter part of the "?:") we get bias=0x200000.  With debuginfo, we get bias=0x38bbe00000, and then the reloc_addr basically ends up with the prelinked base address added twice (0x7177a0f536).  Thus in the following lines that are supposed use dwfl_module_relocate_address() to get something relative, it only subtracts once and we end up with the wrong value.

Since the probe address is coming from the elf, not dwarf, I think the right fix will be to always use the bias from dwfl_module_getelf.  But that's only true of sdt v3 -- for older compatibility I'll have to think more...

Comment 10 Josh Stone 2011-07-14 23:05:30 UTC
Created attachment 513288 [details]
Fix SDT relocations in prelinked modules

This works with everything I've thrown at it so far, including some exelib.exp tests that were failing and hadn't been investigated yet. :)

Comment 11 Josh Stone 2011-07-15 21:20:33 UTC
(In reply to comment #10)
> Created attachment 513288 [details]
> Fix SDT relocations in prelinked modules

Upstream commit 7d395255.

Until we get this out to a release, "prelink -u" is the best workaround.

Comment 12 Fedora Update System 2011-08-02 20:43:58 UTC
systemtap-1.6-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/systemtap-1.6-1.fc15

Comment 13 Fedora Update System 2011-08-03 02:35:58 UTC
Package systemtap-1.6-1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemtap-1.6-1.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/systemtap-1.6-1.fc15
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2011-08-10 03:19:26 UTC
systemtap-1.6-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.