Bug 896281
Summary: | Stop using clone(2) and use fork() instead | ||
---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Eric Paris <eparis> |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | unspecified | CC: | berrange, clalancette, crobinso, dwalsh, itamar, jakub, jforbes, laine, libvirt-maint, rbalakri, veillard, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-23 21:58:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Paris
2013-01-16 23:10:51 UTC
I'm afraid we can't do this. It is not possible use CLONE_NEWPID with unshare(). New PID namespaces can only be started with clone(). WRT to clone() + getpid() the man page disagrees wtih what you describe "Versions of the GNU C library that include the NPTL threading library contain a wrapper function for getpid(2) that performs caching of PIDs. This caching relies on support in the glibc wrapper for clone(), but as currently implemented, the cache may not be up to date in some circumstances. In particular, if a sig‐ nal is delivered to the child immediately after the clone() call, then a call to getpid(2) in a handler for the signal may return the PID of the calling process ("the parent"), if the clone wrapper has not yet had a chance to update the PID cache in the child. " we don't have a case where a signal handler will run, so that doesn't matter to us. We have not encountered any problems with glibc in our current usage of clone. What is the specific problem libselinux is seeing ? setfscreatecon basically does: static __thread char *cachedvalue; void clear(void) { cachedvalue = NULL; } int setfscreatecon(char *newvalue) { pthread_atfork(NULL, NULL, clear); // clear TLS on fork() if (!strncmp(newvalue, cachedvalue)) return 0; set_new_value_in_kernel(newvalue); cachedvalue = newvalue; } Now libvirt uses the setfscreatecon (and getfscreatecon) which will set the cachedvalue. On clone(2) there is no way to UNSET the cachedvalue in the userspace library, even though in the kernel the new task will not have a value set. So when the child uses get/set it will be against the CACHED value, which is inconsistent with the kernel state. glibc provides no mechansim to do the cleanup... If you use fork() glibc will call out to the pthread_atfork() -> clear() function. If you use pthreads, glibc will take care of the variable because it is __thread. But with clone, you just get the same old value. Hoping the libc people will help me find a better way, but there is no atclone() equivalent to pthread_atfork(). As of v3.8 the kernel supports unshare(CLONE_NETPID) commit 50804fe37 I will implement a workaround in the library to try to make it usually work with your current code __thread pid_t lasttid; int setfscreatecon(char *newvalue) { pid_t new_tid; new_tid = syscall(__NR_gettid); if (newtid != lasttid) { clear(); lasttid = newtid; } [...] } This isn't perfect as tid wrap around means we could still mess it up. But it'll work most of the time... Why is libselinux attempting todo such caching ? I tend to feel that libselinux ought to be a direct mapping to the kernel corresponding syscalls, and if apps need caching they ought to do it at their code level We found 2 users with significant enough usage to make the caching beneficial, udev/systemd and coreutils (namely install). Once we have 2 users needing to do the exact same thing it starts to look reasonable to try to do the work at the lower level rather than in each program. I'll put my workaround in the library, but I'd really like to see libvirt move to fork()+unshare() (like systemd) (In reply to comment #4) > We found 2 users with significant enough usage to make the caching > beneficial, udev/systemd and coreutils (namely install). > > Once we have 2 users needing to do the exact same thing it starts to look > reasonable to try to do the work at the lower level rather than in each > program. > > I'll put my workaround in the library, but I'd really like to see libvirt > move to fork()+unshare() (like systemd) systemd only uses unshare() when it is giving system services a private network or filesystem namespace. For the systemd nsspawn command, it uses clone in the same way as libvirt. Much as I'd like to use unshare(), we need to use CLONE_NEWPID, so clone() is our only option. I understand your situation. CLONE_NEWPID does work with unshare in kernel 3.8. But nothing but rawhide has that right now... Moving to the upstream tracker. 3.8 is available generally in fedora now, but it doesn't sound like the current libvirt way of doing things is causing obvious problems. What's the conclusion here? lxc still uses clone() (as does libcontainer for example). I don't think this is ever going to change, so closing |