Hide Forgot
Description of problem: Libvirt has a race window where event handlers make reference to a callback handler retrieved from an array that can be simultaneously reallocated by other threads. If this race is lost, the callback will be called with stale data; or even worse, libvirt will crash when accessing invalidated memory. Version-Release number of selected component (if applicable): libvirt-0.8.7-3.el6; libvirt-0.8.2-15.el5_6.1 How reproducible: Found via code inspection; it would be difficult to set up a test to actually expose the race without the use of debuggers and/or recompilation to inject arbitrary sleeps() to force the outcome of the race. The fact that the bug has been upstream for more than 2 years states that either the race is uncommon, or that no one has been able to pin an actual failure/data corruption on this particular bug, but does not lessen the severity of the bug itself. Steps to Reproduce: 1. thread 1 is in daemon/event.c:virEventDispatchHandles, and determines that a callback must be called prior to releasing the lock 2. thread 2 wakes up and registers a new handle, which causes the eventLoop.handles array to be reallocated 3. thread 1 resumes, and grabs the argument to the callback from stale memory Actual results: reading stale memory has unspecified results Expected results: thread 1 should no longer refer to the array after releasing lock Additional info: Fixed by the first hunk of this upstream patch: https://www.redhat.com/archives/libvir-list/2011-January/msg00921.html
Patch posted for 6.1; should be a 6.0.z candidate as well: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg01104.html
Short of code inspection, I think the only way to verify this is with some intensive gdb manipulation to expose the race, as well as using MALLOC_PERTURB_ tuning to make realloc() overwrite just-freed data. It would involve writing a custom test scenario (although we could use tests/eventtest.c as a start) to set up just enough events where the next registration would trigger the realloc, then start the event loop, then register another event in the main thread. Then, in the debugger, you'd have to put a breakpoint in the main thread after the event loop is kicked off but before registering the next event, as well as an instruction level breakpoint at the point after the array base address has been read outside the lock, then resume the main thread to cause the array to be realloced, then back to the event thread to prove that stale memory was dereferenced. I haven't tried to set up such a scenario myself, because it seems like a lot of effort to set up such a test that will catch a window of only a few assembly instructions. Which unfortunately means that the best you may be able to do here is code inspection.
According to Comment #4, Check in source packages that libvirt-event-fix-event-handling-data-race.patch has been included in libvirt-0.8.7-8.el6.src.rpm. So verify this bug as Passed with libvirt-0.8.7-8.el6.x86_64.
checked with libvirt-0.8.7-18.el6.src.rpm libvirt-event-fix-event-handling-data-race.patch has been included So keep the VERIFIED status
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html