Bug 634242 - stap script generates errors when using clone() with various namespace flags
stap script generates errors when using clone() with various namespace flags
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: systemtap (Show other bugs)
14
All Linux
low Severity medium
: ---
: ---
Assigned To: David Smith
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-15 11:53 EDT by Daniel Berrange
Modified: 2011-01-26 12:37 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 643866 (view as bug list)
Environment:
Last Closed: 2011-01-26 12:37:39 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Daniel Berrange 2010-09-15 11:53:55 EDT
Description of problem:
When testing the 'client.stp' script that is part of libvirtd dtrace/systemtap support in 

  https://bugzilla.redhat.com/show_bug.cgi?id=552387

When libvirtd starts up the script will print out many errors

WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x415b6f)' addr 0000000000415b6f rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x414d65)' addr 0000000000414d65 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x415b78)' addr 0000000000415b78 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x414249)' addr 0000000000414249 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x414259)' addr 0000000000414259 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x413ff2)' addr 0000000000413ff2 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x415b8e)' addr 0000000000415b8e rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x41634e)' addr 000000000041634e rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x422a20)' addr 0000000000422a20 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x4235fe)' addr 00000000004235fe rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x4239ce)' addr 00000000004239ce rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x4229e0)' addr 00000000004229e0 rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x42358b)' addr 000000000042358b rc -3
WARNING: u*probe failed libvirtd[8155] 'process("/usr/sbin/libvirtd").statement(0x42395b)' addr 000000000042395b rc -3


By inserting some fprintf + sleep calls into libvirtd source, I determined that it seems to be the LXC driver clone() calls that trigger these problems.

Specifically this code in libvirt appears to make stap unhappy:



static int lxcContainerDummyChild(void *argv ATTRIBUTE_UNUSED)
{
    _exit(0);
}

int lxcContainerAvailable(int features)
{
    int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|
        CLONE_NEWIPC|SIGCHLD;
    int cpid;
    char *childStack;
    char *stack;
    int childStatus;

    if (features & LXC_CONTAINER_FEATURE_USER)
        flags |= CLONE_NEWUSER;

    if (features & LXC_CONTAINER_FEATURE_NET)
        flags |= CLONE_NEWNET;

    if (VIR_ALLOC_N(stack, getpagesize() * 4) < 0) {
        DEBUG0("Unable to allocate stack");
        return -1;
    }

    childStack = stack + (getpagesize() * 4);

    cpid = clone(lxcContainerDummyChild, childStack, flags, NULL);
    VIR_FREE(stack);
    if (cpid < 0) {
        char ebuf[1024];
        DEBUG("clone call returned %s, container support is not enabled",
              virStrerror(errno, ebuf, sizeof ebuf));
        return -1;
    } else {
        waitpid(cpid, &childStatus, 0);
    }

    return 0;
}


I expect one of those CLONE_* flags is causing violation of some assumption that systemtap has.


Version-Release number of selected component (if applicable):
systemtap-1.3-2.fc13.x86_64
kernel 2.6.34.6-54.fc13.x86_64
libvirt GIT latst + patches from BZ above.

How reproducible:
Always, if libvirtd has its 'LXC' driver enabled.

Steps to Reproduce:
1. In one shell run 'stap client.stp' from bz 552387
2. In another shell run /usr/sbin/libvirtd as root, again with patches from bz 552387
3.
  
Actual results:
Missing probe warnings

Expected results:
No probe warnings

Additional info:
Comment 1 Roland McGrath 2010-09-15 13:56:08 EDT
I'm sure it's CLONE_NEWPID.  uprobes internals has pid lookups, which is fundamentally broken.  It needs to be fixed to use task_struct or struct pid as keys or something like that.
Comment 2 Frank Ch. Eigler 2010-09-17 10:37:46 EDT
roland advises this may be sufficient.  There are two other places in stap
where find_task_by_pid is used; it would be good to check whether those
need changing too.

diff --git a/runtime/uprobes/uprobes.c b/runtime/uprobes/uprobes.c
index 403de18..3f76ec6 100644
--- a/runtime/uprobes/uprobes.c
+++ b/runtime/uprobes/uprobes.c
@@ -876,7 +876,7 @@ static struct task_struct *uprobe_get_task(pid_t pid)
 {
 	struct task_struct *p;
 	rcu_read_lock();
-	p = find_task_by_pid(pid);
+	p = find_task_by_pid_ns(pid, &init_pid_ns);
 	if (p)
 		get_task_struct(p);
 	rcu_read_unlock();
Comment 3 Daniel Berrange 2010-09-17 13:17:29 EDT
I made that change to /usr/share/systemtap/runtime/uprobes/uprobes.c and deleted the cached kernel module so it re-built, but nothing appears to change. Is that file really still used ?  AFAICT, only the files in runtime/uprobes2/ are actually being compiled on my current host.
Comment 4 Mark Wielaard 2010-09-19 10:23:16 EDT
(In reply to comment #3)
> I made that change to /usr/share/systemtap/runtime/uprobes/uprobes.c and
> deleted the cached kernel module so it re-built, but nothing appears to change.
> Is that file really still used ?  AFAICT, only the files in runtime/uprobes2/
> are actually being compiled on my current host.

Yes, you are right, on modern kernels only uprobes2 is used. It seems this is also not fully namespace aware (it uses find_vpid for example which I believe isn't namespace aware). But I am not sure what all the necessary changes are to make it so.
Comment 5 David Smith 2010-09-20 14:42:53 EDT
Here's a small update.  I've duplicated this problem.  I've also discovered that specifying 'CLONE_NEWPID' (as was suspected) is certainly the problem.  Without 'CLONE_NEWPID', the error doesn't occur.
Comment 6 David Smith 2010-10-06 15:23:12 EDT
I've fixed this upstream in several commits.

86229a5 fixes the problem for current kernels:

<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=86229a5533de13b6ac6eeb34d9ea24e7cfb64faa>

e5a338c fixed the problem for rhel5-era kernels:

<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=e5a338c3a2aeb1d5dfa27f4d30dd04bfd8c61ce4>

0ac3dce added a test case that tests those CLONE_* flags.

<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=0ac3dce18d9bcadff5f2f5f9274a7b40889d1d1a>

There were actually 2 related problems here:

- When CLONE_NEWPID was used, systemtap was looking for the pid in the private pid namespace, not the public one.

- When CLONE_VM was used, uprobe probes got removed in the newly cloned process.
Comment 7 Daniel Berrange 2010-10-18 06:39:23 EDT
I have confirmed that changeset 86229a5 applied to the current F13 RPM fixes the problem I see with libvirt + LXC.
Comment 8 Frank Ch. Eigler 2011-01-26 12:37:39 EST
systemtap 1.4 includes the above fixes, and is available in rawhide and in update-testing for earlier fedoras.

Note You need to log in before you can comment on or make changes to this bug.