Bug 634242
Summary: | stap script generates errors when using clone() with various namespace flags | |||
---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Berrangé <berrange> | |
Component: | systemtap | Assignee: | David Smith <dsmith> | |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 14 | CC: | dsmith, fche, jistone, mjw, mjw, roland, scox, wcohen | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 643866 (view as bug list) | Environment: | ||
Last Closed: | 2011-01-26 17:37:39 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Daniel Berrangé
2010-09-15 15:53:55 UTC
I'm sure it's CLONE_NEWPID. uprobes internals has pid lookups, which is fundamentally broken. It needs to be fixed to use task_struct or struct pid as keys or something like that. roland advises this may be sufficient. There are two other places in stap where find_task_by_pid is used; it would be good to check whether those need changing too. diff --git a/runtime/uprobes/uprobes.c b/runtime/uprobes/uprobes.c index 403de18..3f76ec6 100644 --- a/runtime/uprobes/uprobes.c +++ b/runtime/uprobes/uprobes.c @@ -876,7 +876,7 @@ static struct task_struct *uprobe_get_task(pid_t pid) { struct task_struct *p; rcu_read_lock(); - p = find_task_by_pid(pid); + p = find_task_by_pid_ns(pid, &init_pid_ns); if (p) get_task_struct(p); rcu_read_unlock(); I made that change to /usr/share/systemtap/runtime/uprobes/uprobes.c and deleted the cached kernel module so it re-built, but nothing appears to change. Is that file really still used ? AFAICT, only the files in runtime/uprobes2/ are actually being compiled on my current host. (In reply to comment #3) > I made that change to /usr/share/systemtap/runtime/uprobes/uprobes.c and > deleted the cached kernel module so it re-built, but nothing appears to change. > Is that file really still used ? AFAICT, only the files in runtime/uprobes2/ > are actually being compiled on my current host. Yes, you are right, on modern kernels only uprobes2 is used. It seems this is also not fully namespace aware (it uses find_vpid for example which I believe isn't namespace aware). But I am not sure what all the necessary changes are to make it so. Here's a small update. I've duplicated this problem. I've also discovered that specifying 'CLONE_NEWPID' (as was suspected) is certainly the problem. Without 'CLONE_NEWPID', the error doesn't occur. I've fixed this upstream in several commits. 86229a5 fixes the problem for current kernels: <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=86229a5533de13b6ac6eeb34d9ea24e7cfb64faa> e5a338c fixed the problem for rhel5-era kernels: <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=e5a338c3a2aeb1d5dfa27f4d30dd04bfd8c61ce4> 0ac3dce added a test case that tests those CLONE_* flags. <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=patch;h=0ac3dce18d9bcadff5f2f5f9274a7b40889d1d1a> There were actually 2 related problems here: - When CLONE_NEWPID was used, systemtap was looking for the pid in the private pid namespace, not the public one. - When CLONE_VM was used, uprobe probes got removed in the newly cloned process. I have confirmed that changeset 86229a5 applied to the current F13 RPM fixes the problem I see with libvirt + LXC. systemtap 1.4 includes the above fixes, and is available in rawhide and in update-testing for earlier fedoras. |