Bug 673543

Summary: libvirt can deadlock when spawning child processes
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: dallan, eblake, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-23 20:06:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eric Blake 2011-01-28 16:33:07 UTC
Description of problem:
POSIX is explicit that multi-threaded apps can only safely use async-signal-safe functions in between fork() and exec().  This is because the fork may have happened while some other thread holds a mutex, such as the malloc mutex; but the child no longer has the other thread available to release the mutex, so anything the child does that tries to obtain the same mutex, such as malloc, can deadlock.

Version-Release number of selected component (if applicable):
libvirt-v0.8.7-4.el6 (all existing RHEL libvirt releases)

How reproducible:
rare, found by code inspection rather than by an actual deadlock

Steps to Reproduce:
It may be possible to force the situation by using gdb to specifically block one thread inside malloc() then cause execution of another libvirt function that spawns a child process with an error that will trigger libvirt to use malloc() to try and report the error; however, I have not yet tried to construct such a test scenario.
  
Actual results:
Potential for deadlock.

Expected results:
Libvirt should obey the POSIX rules of only using async-signal-safe functions between fork and exec.

Additional info:
first brought up in this upstream thread:
https://www.redhat.com/archives/libvir-list/2011-January/msg01214.html

Comment 8 Dave Allan 2011-11-23 20:06:31 UTC
Closing as this appears to be not a problem in practice.

Comment 9 Eric Blake 2011-12-05 19:20:45 UTC
It turns out that this HAS been reported as a problem in practice, though with the localtime_r() lock rather than a malloc() lock.  See bug 757382.