Bug 1303031
Summary: | libvirtd hang since fork() was called while another thread had security manager locked | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | yafu <yafu> |
Component: | libvirt | Assignee: | Virtualization Maintenance <virt-maint> |
Status: | CLOSED CANTFIX | QA Contact: | yafu <yafu> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.0 | CC: | dyuan, fjin, jsuchane, mprivozn, rbalakri, rjones, xuzhang, yafu |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-05 15:18:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1401400 |
Description
yafu
2016-01-29 11:22:44 UTC
We haven't seen any hangs recently, and this was tested against a very old version of libvirt. Is it possible to retest this using RHEL 7.4 / 7.5 libvirt to see if it still happens (even rarely after 20+ hours)? Otherwise I suggest closing this and if we find problems with locking we can reopen or open a new bug. (In reply to Richard W.M. Jones from comment #1) > We haven't seen any hangs recently, and this was tested against > a very old version of libvirt. > > Is it possible to retest this using RHEL 7.4 / 7.5 libvirt to see > if it still happens (even rarely after 20+ hours)? > > Otherwise I suggest closing this and if we find problems with locking > we can reopen or open a new bug. I retested the bug with libvirt-3.8.0-1.el7.x86_64. libvirtd will hang in about 5 minutes. The backtrace of libvirtd is the same as comment 0. I've looked at the code and the issue was not fixed yet: qemuProcessHook calls qemuSecurityPostFork which is supposed to force-unlock the security manager to use in the forked process. The problem is that since we have the "stack" security driver which manages internally other security drivers the only one that gets unlokced is the top one, but no of the nested ones. Thus if one of the nested drivers ("dac" or "selinux") is still locked the above "workaround" for post-fork locking will not fix it. This means that qemuSecurityPostFork must be fixed such that it also iterates through all the nested security drivers and unlocks the manager object for every single nested driver. Okay, so the code actually locks and then unlocks also the nested drivers prior to fork, but does not fork only after both are locked. That would mean that the above scenario can happen only if one of the nested drivers were in use. I didn't find a code path for it yet. I don't think this is solvable problem. One might suggest pthread_atfork() to unlock all mutexes in the child. But the problem with that approach is that majority of our mutexes is not global (as in global variable) rather than contained in a structure they are guarding. Therefore it might not be possible to unlock all mutexes in (child) handler of pthread_atfork(). And even if it were, what about all libraries libvirt is linked with? Not to mention that atfork handlers are not called when creating a child process via clone(). Sorry. |