Bug 673543 - libvirt can deadlock when spawning child processes
Summary: libvirt can deadlock when spawning child processes
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-28 16:33 UTC by Eric Blake
Modified: 2011-12-05 19:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-23 20:06:31 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eric Blake 2011-01-28 16:33:07 UTC
Description of problem:
POSIX is explicit that multi-threaded apps can only safely use async-signal-safe functions in between fork() and exec().  This is because the fork may have happened while some other thread holds a mutex, such as the malloc mutex; but the child no longer has the other thread available to release the mutex, so anything the child does that tries to obtain the same mutex, such as malloc, can deadlock.

Version-Release number of selected component (if applicable):
libvirt-v0.8.7-4.el6 (all existing RHEL libvirt releases)

How reproducible:
rare, found by code inspection rather than by an actual deadlock

Steps to Reproduce:
It may be possible to force the situation by using gdb to specifically block one thread inside malloc() then cause execution of another libvirt function that spawns a child process with an error that will trigger libvirt to use malloc() to try and report the error; however, I have not yet tried to construct such a test scenario.
  
Actual results:
Potential for deadlock.

Expected results:
Libvirt should obey the POSIX rules of only using async-signal-safe functions between fork and exec.

Additional info:
first brought up in this upstream thread:
https://www.redhat.com/archives/libvir-list/2011-January/msg01214.html

Comment 8 Dave Allan 2011-11-23 20:06:31 UTC
Closing as this appears to be not a problem in practice.

Comment 9 Eric Blake 2011-12-05 19:20:45 UTC
It turns out that this HAS been reported as a problem in practice, though with the localtime_r() lock rather than a malloc() lock.  See bug 757382.


Note You need to log in before you can comment on or make changes to this bug.