RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 951073 - libvirtd crash on race with auto-destroy guests
Summary: libvirtd crash on race with auto-destroy guests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 950286
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-11 12:13 UTC by Chris Pelland
Modified: 2013-04-18 15:55 UTC (History)
16 users (show)

Fixed In Version: libvirt-0.10.2-18.el6_4.4
Doc Type: Bug Fix
Doc Text:
Under certain conditions, when a connection was closed, guests set to be automatically destroyed failed to be destroyed and the libvirtd daemon terminated unexpectedly. A series of patches addressing various crash scenarios has been provided and libvirtd no longer crashes while auto-destroying guests.
Clone Of:
Environment:
Last Closed: 2013-04-18 15:55:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
valgrind.log when create autodestroy guest. (1.87 MB, text/plain)
2013-04-15 06:10 UTC, EricLee
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0756 0 normal SHIPPED_LIVE libvirt bug fix update 2013-04-18 19:53:38 UTC

Description Chris Pelland 2013-04-11 12:13:49 UTC
This bug has been copied from bug #950286 and has been proposed
to be backported to 6.4 z-stream (EUS).

Comment 6 Eric Blake 2013-04-12 21:53:12 UTC
Of the series mentioned in comment 4, only 2 of the six patches were necessary for 0.10.2 (commit 1/6 and 5/6).  Here's a reproducer demonstrating the need for 1/6:

In one root window:
$ service libvirtd stop
$ setenforce 0 # necessary since SELinux doesn't transition valgrind to run qemu
$ valgrind libvirtd

In another root window:
$ virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # create --autodestroy fedora-local.xml
Domain fedora-local created from fedora-local.xml

virsh # list
 Id    Name                           State
----------------------------------------------------
 2     fedora-local                   running

virsh # quit

Look at the first window, which now shows:

==25304== Invalid read of size 1
==25304==    at 0x4A07F52: strlen (mc_replace_strmem.c:403)
==25304==    by 0x4E97EE5: virHashStrCode (virhash.c:78)
==25304==    by 0x4E98010: virHashComputeKey (virhash.c:100)
==25304==    by 0x4E989F5: virHashRemoveEntry (virhash.c:461)
==25304==    by 0x12A77A5A: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304==    by 0x4F1E202: virConnectDispose (datatypes.c:144)
==25304==    by 0x4E913DF: virObjectUnref (virobject.c:139)
==25304==    by 0x4F285DF: virConnectClose (libvirt.c:1458)
==25304==    by 0x432159: remoteClientFreeFunc (remote.c:653)
==25304==  Address 0x1554e280 is 0 bytes inside a block of size 37 free'd
==25304==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==25304==    by 0x4E78C12: virFree (memory.c:309)
==25304==    by 0x4E97FC1: virHashStrFree (virhash.c:93)
==25304==    by 0x4E98AE9: virHashRemoveEntry (virhash.c:470)
==25304==    by 0x12A77742: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==25304==    by 0x12A84163: qemuProcessAutoDestroyRemove (qemu_process.c:4460)
==25304==    by 0x12A82A90: qemuProcessStop (qemu_process.c:4068)
==25304==    by 0x12A83F92: qemuProcessAutoDestroy (qemu_process.c:4429)
==25304==    by 0x12A77A26: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304== 
==25304== Invalid read of size 1
==25304==    at 0x4A07F64: strlen (mc_replace_strmem.c:403)
==25304==    by 0x4E97EE5: virHashStrCode (virhash.c:78)
==25304==    by 0x4E98010: virHashComputeKey (virhash.c:100)
==25304==    by 0x4E989F5: virHashRemoveEntry (virhash.c:461)
==25304==    by 0x12A77A5A: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304==    by 0x4F1E202: virConnectDispose (datatypes.c:144)
==25304==    by 0x4E913DF: virObjectUnref (virobject.c:139)
==25304==    by 0x4F285DF: virConnectClose (libvirt.c:1458)
==25304==    by 0x432159: remoteClientFreeFunc (remote.c:653)
==25304==  Address 0x1554e281 is 1 bytes inside a block of size 37 free'd
==25304==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==25304==    by 0x4E78C12: virFree (memory.c:309)
==25304==    by 0x4E97FC1: virHashStrFree (virhash.c:93)
==25304==    by 0x4E98AE9: virHashRemoveEntry (virhash.c:470)
==25304==    by 0x12A77742: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==25304==    by 0x12A84163: qemuProcessAutoDestroyRemove (qemu_process.c:4460)
==25304==    by 0x12A82A90: qemuProcessStop (qemu_process.c:4068)
==25304==    by 0x12A83F92: qemuProcessAutoDestroy (qemu_process.c:4429)
==25304==    by 0x12A77A26: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304== 
==25304== Invalid read of size 2
==25304==    at 0x4A08D6C: memcpy (mc_replace_strmem.c:882)
==25304==    by 0x4E993D8: getblock (virhashcode.c:39)
==25304==    by 0x4E994BF: virHashCodeGen (virhashcode.c:80)
==25304==    by 0x4E97EFA: virHashStrCode (virhash.c:78)
==25304==    by 0x4E98010: virHashComputeKey (virhash.c:100)
==25304==    by 0x4E989F5: virHashRemoveEntry (virhash.c:461)
==25304==    by 0x12A77A5A: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304==    by 0x4F1E202: virConnectDispose (datatypes.c:144)
==25304==    by 0x4E913DF: virObjectUnref (virobject.c:139)
==25304==  Address 0x1554e282 is 2 bytes inside a block of size 37 free'd
==25304==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==25304==    by 0x4E78C12: virFree (memory.c:309)
==25304==    by 0x4E97FC1: virHashStrFree (virhash.c:93)
==25304==    by 0x4E98AE9: virHashRemoveEntry (virhash.c:470)
==25304==    by 0x12A77742: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==25304==    by 0x12A84163: qemuProcessAutoDestroyRemove (qemu_process.c:4460)
==25304==    by 0x12A82A90: qemuProcessStop (qemu_process.c:4068)
==25304==    by 0x12A83F92: qemuProcessAutoDestroy (qemu_process.c:4429)
==25304==    by 0x12A77A26: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==25304==    by 0x4E98C6A: virHashForEach (virhash.c:514)
==25304==    by 0x12A77B02: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==25304==    by 0x12AAC17A: qemudClose (qemu_driver.c:1053)
==25304==

Comment 7 Eric Blake 2013-04-12 22:23:40 UTC
The use-after-free of patch 1/6 is uuidstr in this statement:
        virHashRemoveEntry(data->driver->closeCallbacks, uuidstr);

After more playing around with gdb, I determined that the only time we hit a use-after-free is if the callback already removed the domain - but in that case, the caller also removed the uuidstr from the table of callbacks as part of removing the domain.  The most common case is that the use-after-free memory is untouched, so we are requesting to remove something that no longer exists in the table anyway, so the remove fails but it is benign to normal operation, and at least we aren't leaking a reference in the table.

The following are two theoretical possible worst case scenarios, although both are fairly unlikely.  1. after freeing the memory, libvirtd's memory map changes such that the pointer is no longer mapped in the process, and reading the pointer causes a segfault when it accesses protected memory.  2. something else reallocates the memory, and happens to modify that memory to match a 36-byte uuidstr that is still registered in the hashtable, thus removing the entry of an unrelated close callback for some other domain.  Later actions on that other domain are now no longer properly cleaned up on close.

Comment 9 EricLee 2013-04-15 06:07:47 UTC
Can reproduce the bug with libvirt-0.10.2-18.el6_4.3:

Steps as comment #6:

Get leak like:
==4885== Invalid read of size 1
==4885==    at 0x4A07F52: strlen (mc_replace_strmem.c:403)
==4885==    by 0x3B03E71067: virHashStrCode (virhash.c:78)
==4885==    by 0x3B03E704FC: virHashComputeKey (virhash.c:100)
==4885==    by 0x3B03E70B5C: virHashRemoveEntry (virhash.c:461)
==4885==    by 0x488ACD: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885==    by 0x3B03E6B52A: virObjectUnref (virobject.c:139)
==4885==    by 0x3B03EEBAC7: virConnectClose (libvirt.c:1458)
==4885==    by 0x4406C6: remoteClientFreeFunc (remote.c:678)
==4885==  Address 0x12e5f9a0 is 0 bytes inside a block of size 37 free'd
==4885==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==4885==    by 0x3B03E58718: virFree (memory.c:419)
==4885==    by 0x3B03E70D72: virHashStrFree (virhash.c:93)
==4885==    by 0x3B03E70BC8: virHashRemoveEntry (virhash.c:470)
==4885==    by 0x488D1F: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==4885==    by 0x4B31BA: qemuProcessStop (qemu_process.c:4276)
==4885==    by 0x4B39FF: qemuProcessAutoDestroy (qemu_process.c:4638)
==4885==    by 0x488AAD: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885== 
==4885== Invalid read of size 1
==4885==    at 0x4A07F64: strlen (mc_replace_strmem.c:403)
==4885==    by 0x3B03E71067: virHashStrCode (virhash.c:78)
==4885==    by 0x3B03E704FC: virHashComputeKey (virhash.c:100)
==4885==    by 0x3B03E70B5C: virHashRemoveEntry (virhash.c:461)
==4885==    by 0x488ACD: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885==    by 0x3B03E6B52A: virObjectUnref (virobject.c:139)
==4885==    by 0x3B03EEBAC7: virConnectClose (libvirt.c:1458)
==4885==    by 0x4406C6: remoteClientFreeFunc (remote.c:678)
==4885==  Address 0x12e5f9a1 is 1 bytes inside a block of size 37 free'd
==4885==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==4885==    by 0x3B03E58718: virFree (memory.c:419)
==4885==    by 0x3B03E70D72: virHashStrFree (virhash.c:93)
==4885==    by 0x3B03E70BC8: virHashRemoveEntry (virhash.c:470)
==4885==    by 0x488D1F: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==4885==    by 0x4B31BA: qemuProcessStop (qemu_process.c:4276)
==4885==    by 0x4B39FF: qemuProcessAutoDestroy (qemu_process.c:4638)
==4885==    by 0x488AAD: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885== 
==4885== Invalid read of size 4
==4885==    at 0x3B03E71207: virHashCodeGen (virhashcode.c:82)
==4885==    by 0x3B03E704FC: virHashComputeKey (virhash.c:100)
==4885==    by 0x3B03E70B5C: virHashRemoveEntry (virhash.c:461)
==4885==    by 0x488ACD: qemuDriverCloseCallbackRun (qemu_conf.c:734)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885==    by 0x3B03E6B52A: virObjectUnref (virobject.c:139)
==4885==    by 0x3B03EEBAC7: virConnectClose (libvirt.c:1458)
==4885==    by 0x4406C6: remoteClientFreeFunc (remote.c:678)
==4885==    by 0x3B03F3E65D: virNetServerClientDispose (virnetserverclient.c:590)
==4885==  Address 0x12e5f9a0 is 0 bytes inside a block of size 37 free'd
==4885==    at 0x4A063F0: free (vg_replace_malloc.c:446)
==4885==    by 0x3B03E58718: virFree (memory.c:419)
==4885==    by 0x3B03E70D72: virHashStrFree (virhash.c:93)
==4885==    by 0x3B03E70BC8: virHashRemoveEntry (virhash.c:470)
==4885==    by 0x488D1F: qemuDriverCloseCallbackUnset (qemu_conf.c:674)
==4885==    by 0x4B31BA: qemuProcessStop (qemu_process.c:4276)
==4885==    by 0x4B39FF: qemuProcessAutoDestroy (qemu_process.c:4638)
==4885==    by 0x488AAD: qemuDriverCloseCallbackRun (qemu_conf.c:730)
==4885==    by 0x3B03E70838: virHashForEach (virhash.c:514)
==4885==    by 0x4889B6: qemuDriverCloseCallbackRunAll (qemu_conf.c:746)
==4885==    by 0x46A1D7: qemudClose (qemu_driver.c:1123)
==4885==    by 0x3B03ED48CE: virConnectDispose (datatypes.c:144)
==4885== 

Verified with libvirt-0.10.2-18.el6_4.4:
same steps can not see above leak.

So setting VERIFIED.

However, I got other leak when create the autodestroy guest:

==5399== Warning: noted but unhandled ioctl 0x89a2 with no size/direction hints
==5399==    This could cause spurious value errors to appear.
==5399==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5508== 
==5508== HEAP SUMMARY:
==5508==     in use at exit: 1,627,028 bytes in 14,129 blocks
==5508==   total heap usage: 139,855 allocs, 125,726 frees, 518,153,487 bytes allocated
==5508== 
==5508== LEAK SUMMARY:
==5508==    definitely lost: 1,128 bytes in 3 blocks
==5508==    indirectly lost: 0 bytes in 0 blocks
==5508==      possibly lost: 3,437 bytes in 43 blocks
==5508==    still reachable: 1,622,463 bytes in 14,083 blocks
==5508==         suppressed: 0 bytes in 0 blocks

Details please see the next attachment: valgrind.log.

Hi Eric,

Do you think we should open another bug to trace?

Thanks,
EricLee

Comment 10 EricLee 2013-04-15 06:10:17 UTC
Created attachment 735773 [details]
valgrind.log when create autodestroy guest.

Comment 11 Eric Blake 2013-04-15 20:59:53 UTC
(In reply to comment #9)
> Can reproduce the bug with libvirt-0.10.2-18.el6_4.3:
> 

> 
> Verified with libvirt-0.10.2-18.el6_4.4:
> same steps can not see above leak.
> 
> So setting VERIFIED.
> 
> However, I got other leak when create the autodestroy guest:
> 
> ==5399== Warning: noted but unhandled ioctl 0x89a2 with no size/direction
> hints
> ==5399==    This could cause spurious value errors to appear.
> ==5399==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
> proper wrapper.

This part is not worth worrying about (it's in the child pid, so not a leak in libvirtd proper).

> ==5508== 
> ==5508== HEAP SUMMARY:
> ==5508==     in use at exit: 1,627,028 bytes in 14,129 blocks
> ==5508==   total heap usage: 139,855 allocs, 125,726 frees, 518,153,487
> bytes allocated
> ==5508== 
> ==5508== LEAK SUMMARY:
> ==5508==    definitely lost: 1,128 bytes in 3 blocks
> ==5508==    indirectly lost: 0 bytes in 0 blocks
> ==5508==      possibly lost: 3,437 bytes in 43 blocks
> ==5508==    still reachable: 1,622,463 bytes in 14,083 blocks
> ==5508==         suppressed: 0 bytes in 0 blocks

And you didn't paste enough context to say if this is worth worrying about either.  If you don't tell valgrind to ignore child processes, then we do know that when we fork/exec, that there are several (harmless) reports of leaks in the children just before an exec.  I typically ignore child processes, and focus only on leaks in libvirtd proper.

> 
> Details please see the next attachment: valgrind.log.

That particular log doesn't match the pids in the message quoted above, but it looks okay to me - note this part at the end (any leak in that log not attributed to pid 6074 is not a leak in libvirtd proper, and thus not important to fix in z-stream):

==6074== 
==6074== LEAK SUMMARY:
==6074==    definitely lost: 0 bytes in 0 blocks
==6074==    indirectly lost: 0 bytes in 0 blocks
==6074==      possibly lost: 3,680 bytes in 10 blocks
==6074==    still reachable: 877,272 bytes in 9,011 blocks
==6074==         suppressed: 0 bytes in 0 blocks

> 
> Hi Eric,
> 
> Do you think we should open another bug to trace?

A new BZ would be appropriate if you can determine that such leaks were caused by the 0.10.2-18.el6_4.4 build; but most likely, there is no new problem here, and you could get just as noisy of a valgrind log from 0.10.2-18.el6_4.3 or earlier.

Comment 13 errata-xmlrpc 2013-04-18 15:55:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0756.html


Note You need to log in before you can comment on or make changes to this bug.