Hide Forgot
Description of problem: See this upstream thread: https://www.redhat.com/archives/libvir-list/2011-July/msg01969.html Running a loop of 'virsh managedsave dom && virsh start dom' while virt-manager is connected to libvirtd is able to crash libvirtd because both a sync job (the query commands used by virt-manager) and async job (the managed save) can end up trying to use the qemu monitor at the same time. Version-Release number of selected component (if applicable): libvirt-0.9.4-0rc2.el6.x86_64 How reproducible: I was able to get a loop to crash within 10 iterations pre-patch; post patch got 20 iterations without failure. Steps to Reproduce: 1. for i in `seq 20`; do virsh managedsave dom && virsh start dom || { echo failed on $i; break; }; done 2. 3. Actual results: error: Failed to save domain dom state error: End of file while reading data: Input/output error and libvirtd is crashed Expected results: no crash Additional info: proposed upstream patch: https://www.redhat.com/archives/libvir-list/2011-July/msg02077.html That thread mentioned another potential issue with killing libvirtd in the middle of a managed save, but I think it is a distinct issue and will be opening a second bz.
regression introduced in upstream commit 361842881e (after 0.9.3 but prior to 0.9.4-rc1).
see bug 727254 for another, less severe, issue noticed with managedsave while trying to work on the patch for this bug.
In POST: commit 193cd0f3c879619619a3c35d25311e98693fe2ef Author: Eric Blake <eblake> Date: Thu Jul 28 17:18:24 2011 -0600 qemu: fix crash when mixing sync and async monitor jobs Currently, we attempt to run sync job and async job at the same time. It means that the monitor commands for two jobs can be run in any order. In the function qemuDomainObjEnterMonitorInternal(): if (priv->job.active == QEMU_JOB_NONE && priv->job.asyncJob) { if (qemuDomainObjBeginNestedJob(driver, obj) < 0) We check whether the caller is an async job by priv->job.active and priv->job.asynJob. But when an async job is running, and a sync job is also running at the time of the check, then priv->job.active is not QEMU_JOB_NONE. So we cannot check whether the caller is an async job in the function qemuDomainObjEnterMonitorInternal(), and must instead put the burden on the caller to tell us when an async command wants to do a nested job.
Reproduced this bug with libvirt-0.9.4-0rc2.el6 and verified pass with libvirt-0.9.4-0rc1.2.el6.
Moved it to VERIFIED according to comment 6.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html