671567 – race condition in libvirt could lead to crash on event handling

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 671567 - race condition in libvirt could lead to crash on event handling

Summary: race condition in libvirt could lead to crash on event handling

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	6.1
Assignee:	Eric Blake
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	671569
TreeView+	depends on / blocked

Reported:	2011-01-21 21:24 UTC by Eric Blake
Modified:	2011-05-19 13:26 UTC (History)
CC List:	7 users (show)
Fixed In Version:	libvirt-0.8.7-4.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	671569 (view as bug list)
Environment:
Last Closed:	2011-05-19 13:26:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0596	0	normal	SHIPPED_LIVE	libvirt bug fix and enhancement update	2011-05-18 17:56:36 UTC

Description Eric Blake 2011-01-21 21:24:31 UTC

Description of problem:
Libvirt has a race window where event handlers make reference to a callback handler retrieved from an array that can be simultaneously reallocated by other threads.  If this race is lost, the callback will be called with stale data; or even worse, libvirt will crash when accessing invalidated memory.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-3.el6; libvirt-0.8.2-15.el5_6.1

How reproducible:
Found via code inspection; it would be difficult to set up a test to actually expose the race without the use of debuggers and/or recompilation to inject arbitrary sleeps() to force the outcome of the race.  The fact that the bug has been upstream for more than 2 years states that either the race is uncommon, or that no one has been able to pin an actual failure/data corruption on this particular bug, but does not lessen the severity of the bug itself.

Steps to Reproduce:
1. thread 1 is in daemon/event.c:virEventDispatchHandles, and determines that a callback must be called prior to releasing the lock
2. thread 2 wakes up and registers a new handle, which causes the eventLoop.handles array to be reallocated
3. thread 1 resumes, and grabs the argument to the callback from stale memory
  
Actual results:
reading stale memory has unspecified results

Expected results:
thread 1 should no longer refer to the array after releasing lock

Additional info:
Fixed by the first hunk of this upstream patch:
https://www.redhat.com/archives/libvir-list/2011-January/msg00921.html

Comment 1 Eric Blake 2011-01-21 23:18:15 UTC

Patch posted for 6.1; should be a 6.0.z candidate as well:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg01104.html

Comment 4 Eric Blake 2011-02-11 16:13:47 UTC

Short of code inspection, I think the only way to verify this is with some intensive gdb manipulation to expose the race, as well as using MALLOC_PERTURB_  tuning to make realloc() overwrite just-freed data.  It would involve writing a custom test scenario (although we could use tests/eventtest.c as a start) to set up just enough events where the next registration would trigger the realloc, then start the event loop, then register another event in the main thread.  Then, in the debugger, you'd have to put a breakpoint in the main thread after the event loop is kicked off but before registering the next event, as well as an instruction level breakpoint at the point after the array base address has been read outside the lock, then resume the main thread to cause the array to be realloced, then back to the event thread to prove that stale memory was dereferenced.

I haven't tried to set up such a scenario myself, because it seems like a lot of effort to set up such a test that will catch a window of only a few assembly instructions.  Which unfortunately means that the best you may be able to do here is code inspection.

Comment 5 Min Zhan 2011-03-01 06:03:35 UTC

According to Comment #4, Check in source packages that libvirt-event-fix-event-handling-data-race.patch has been included in libvirt-0.8.7-8.el6.src.rpm. 

So verify this bug as Passed with libvirt-0.8.7-8.el6.x86_64.

Comment 6 Vivian Bian 2011-04-19 02:43:11 UTC

checked with libvirt-0.8.7-18.el6.src.rpm

libvirt-event-fix-event-handling-data-race.patch has been included

So keep the VERIFIED status

Comment 9 errata-xmlrpc 2011-05-19 13:26:07 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html

Note You need to log in before you can comment on or make changes to this bug.