Bug 671567

Summary: race condition in libvirt could lead to crash on event handling
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.0CC: dyuan, eblake, gren, mzhan, vbian, veillard, xen-maint
Target Milestone: rc   
Target Release: 6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-4.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 671569 (view as bug list) Environment:
Last Closed: 2011-05-19 13:26:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 671569    

Description Eric Blake 2011-01-21 21:24:31 UTC
Description of problem:
Libvirt has a race window where event handlers make reference to a callback handler retrieved from an array that can be simultaneously reallocated by other threads.  If this race is lost, the callback will be called with stale data; or even worse, libvirt will crash when accessing invalidated memory.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-3.el6; libvirt-0.8.2-15.el5_6.1

How reproducible:
Found via code inspection; it would be difficult to set up a test to actually expose the race without the use of debuggers and/or recompilation to inject arbitrary sleeps() to force the outcome of the race.  The fact that the bug has been upstream for more than 2 years states that either the race is uncommon, or that no one has been able to pin an actual failure/data corruption on this particular bug, but does not lessen the severity of the bug itself.

Steps to Reproduce:
1. thread 1 is in daemon/event.c:virEventDispatchHandles, and determines that a callback must be called prior to releasing the lock
2. thread 2 wakes up and registers a new handle, which causes the eventLoop.handles array to be reallocated
3. thread 1 resumes, and grabs the argument to the callback from stale memory
  
Actual results:
reading stale memory has unspecified results

Expected results:
thread 1 should no longer refer to the array after releasing lock

Additional info:
Fixed by the first hunk of this upstream patch:
https://www.redhat.com/archives/libvir-list/2011-January/msg00921.html

Comment 1 Eric Blake 2011-01-21 23:18:15 UTC
Patch posted for 6.1; should be a 6.0.z candidate as well:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg01104.html

Comment 4 Eric Blake 2011-02-11 16:13:47 UTC
Short of code inspection, I think the only way to verify this is with some intensive gdb manipulation to expose the race, as well as using MALLOC_PERTURB_  tuning to make realloc() overwrite just-freed data.  It would involve writing a custom test scenario (although we could use tests/eventtest.c as a start) to set up just enough events where the next registration would trigger the realloc, then start the event loop, then register another event in the main thread.  Then, in the debugger, you'd have to put a breakpoint in the main thread after the event loop is kicked off but before registering the next event, as well as an instruction level breakpoint at the point after the array base address has been read outside the lock, then resume the main thread to cause the array to be realloced, then back to the event thread to prove that stale memory was dereferenced.

I haven't tried to set up such a scenario myself, because it seems like a lot of effort to set up such a test that will catch a window of only a few assembly instructions.  Which unfortunately means that the best you may be able to do here is code inspection.

Comment 5 Min Zhan 2011-03-01 06:03:35 UTC
According to Comment #4, Check in source packages that libvirt-event-fix-event-handling-data-race.patch has been included in libvirt-0.8.7-8.el6.src.rpm. 

So verify this bug as Passed with libvirt-0.8.7-8.el6.x86_64.

Comment 6 Vivian Bian 2011-04-19 02:43:11 UTC
checked with libvirt-0.8.7-18.el6.src.rpm

libvirt-event-fix-event-handling-data-race.patch has been included

So keep the VERIFIED status

Comment 9 errata-xmlrpc 2011-05-19 13:26:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html