Bug 999765

Summary: Race condition in libvirt causes hang with qemu 1.6
Product: [Community] Virtualization Tools Reporter: Joseph Wang <joequant>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED DEFERRED QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: crobinso, hzguanqiang, rbalakri, sipingal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-10 14:06:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joseph Wang 2013-08-22 04:59:04 UTC
Description of problem:

My setup with libvirt hangs while computing the capabilities for qemu.  This is due to a race condition for libvirt in virCommandRun.  libvirt starts the qemu backends in daemonize mode.  

If the qemu backend exits before hitting the poll in virCommandProcessIO then the sockets are connected to a zombie process, and libvirtd hangs.  

The stack trace is as follows

Breakpoint 3, virCommandProcessIO (cmd=cmd@entry=0x7fffd4103f20)
    at util/vircommand.c:1884
1884	        if (poll(fds, nfds, -1) < 0) {
(gdb) where
#0  virCommandProcessIO (cmd=cmd@entry=0x7fffd4103f20)
    at util/vircommand.c:1884
#1  0x00007ffff753cb32 in virCommandRun (cmd=cmd@entry=0x7fffd4103f20, 
    exitstatus=exitstatus@entry=0x7fffdb87a120) at util/vircommand.c:2100
#2  0x00007fffdd7f0a40 in virQEMUCapsInitQMP (runGid=0, runUid=0, 
    libDir=<optimized out>, qemuCaps=0x7fffd4088910)
    at qemu/qemu_capabilities.c:2529
#3  virQEMUCapsNewForBinary (
    binary=binary@entry=0x7fffd40acb30 "/usr/bin/qemu-system-cris", 
    libDir=<optimized out>, runUid=0, runGid=0)
    at qemu/qemu_capabilities.c:2677
#4  0x00007fffdd7f246b in virQEMUCapsCacheLookup (
    cache=cache@entry=0x7fffd40aca30, 
    binary=0x7fffd40acb30 "/usr/bin/qemu-system-cris")
    at qemu/qemu_capabilities.c:2763
#5  0x00007fffdd7f2961 in virQEMUCapsInitGuest (guestarch=VIR_ARCH_CRIS, 
    hostarch=VIR_ARCH_X86_64, cache=0x7fffd40aca30, caps=0x7fffd40acde0)
    at qemu/qemu_capabilities.c:685
#6  virQEMUCapsInit (cache=0x7fffd40aca30) at qemu/qemu_capabilities.c:905
#7  0x00007fffdd8202bb in virQEMUDriverCreateCapabilities (
    driver=driver@entry=0x7fffd40a7240) at qemu/qemu_conf.c:569
#8  0x00007fffdd852fe4 in qemuStateInitialize (privileged=<optimized out>, 
    callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:748

How reproducible:

On my machine libvirtd will lock consistently with qemu 1.6 while working on qemu 1.5.  However, as this is a race condition, this is likely to be happen differently on different machines.

The solution is to check if the process is a zombie before attempting to poll its sockets.

Comment 1 Cole Robinson 2016-04-10 14:06:34 UTC
Sorry this didn't receive a timely response. I recall a fix for this going into libvirt around the time of this bug but I can't find the commit... given the age of this bug I'm closing it as DEFERRED but if anyone still hits similar issues with recent libvirt and qemu, please reopen

Comment 2 Cole Robinson 2016-04-10 15:20:39 UTC
*** Bug 1028728 has been marked as a duplicate of this bug. ***