Bug 727249

Summary: managedsave can crash libvirt
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: ajia, dyuan, rwu, whuang, ydu, yupzhang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-0rc1.2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:20:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 690175    
Bug Blocks:    

Description Eric Blake 2011-08-01 16:34:32 UTC
Description of problem:
See this upstream thread: https://www.redhat.com/archives/libvir-list/2011-July/msg01969.html

Running a loop of 'virsh managedsave dom && virsh start dom' while virt-manager is connected to libvirtd is able to crash libvirtd because both a sync job (the query commands used by virt-manager) and async job (the managed save) can end up trying to use the qemu monitor at the same time.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-0rc2.el6.x86_64

How reproducible:
I was able to get a loop to crash within 10 iterations pre-patch; post patch got 20 iterations without failure.

Steps to Reproduce:
1. for i in `seq 20`; do virsh managedsave dom && virsh start dom || { echo failed on $i; break; }; done
2.
3.
  
Actual results:
error: Failed to save domain dom state
error: End of file while reading data: Input/output error
and libvirtd is crashed

Expected results:
no crash

Additional info:
proposed upstream patch:
https://www.redhat.com/archives/libvir-list/2011-July/msg02077.html
That thread mentioned another potential issue with killing libvirtd in the middle of a managed save, but I think it is a distinct issue and will be opening a second bz.

Comment 1 Eric Blake 2011-08-01 16:37:13 UTC
regression introduced in upstream commit 361842881e (after 0.9.3 but prior to 0.9.4-rc1).

Comment 3 Eric Blake 2011-08-01 16:49:53 UTC
see bug 727254 for another, less severe, issue noticed with managedsave while trying to work on the patch for this bug.

Comment 4 Eric Blake 2011-08-01 17:05:00 UTC
In POST:

commit 193cd0f3c879619619a3c35d25311e98693fe2ef
Author: Eric Blake <eblake>
Date:   Thu Jul 28 17:18:24 2011 -0600

    qemu: fix crash when mixing sync and async monitor jobs
    
    Currently, we attempt to run sync job and async job at the same time. It
    means that the monitor commands for two jobs can be run in any order.
    
    In the function qemuDomainObjEnterMonitorInternal():
        if (priv->job.active == QEMU_JOB_NONE && priv->job.asyncJob) {
            if (qemuDomainObjBeginNestedJob(driver, obj) < 0)
    We check whether the caller is an async job by priv->job.active and
    priv->job.asynJob. But when an async job is running, and a sync job is
    also running at the time of the check, then priv->job.active is not
    QEMU_JOB_NONE. So we cannot check whether the caller is an async job
    in the function qemuDomainObjEnterMonitorInternal(), and must instead
    put the burden on the caller to tell us when an async command wants
    to do a nested job.

Comment 6 dyuan 2011-08-02 07:56:33 UTC
Reproduced this bug with libvirt-0.9.4-0rc2.el6 and verified pass with libvirt-0.9.4-0rc1.2.el6.

Comment 8 dyuan 2011-08-05 10:31:48 UTC
Moved it to VERIFIED according to comment 6.

Comment 9 errata-xmlrpc 2011-12-06 11:20:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html