Description of problem: I get this error intermittently when calling virDomainDestroyFlags with flags=VIR_DOMAIN_DESTROY_GRACEFUL. Fatal error: exception Guestfs.Error("could not destroy libvirt domain: Requested operation is not valid: domain is not running [code=55 domain=10]") The domain has possibly exited itself before we call virDomainDestroyFlags. However, and this is strange: if I add a sleep to the guest so it doesn't shut down immediately, eg. 'sleep 30', then virDomainDestroyFlags will hang for 30 seconds, and *then* give the same error as above. There are no errors in the qemu log file. qemu does not appear to be segfaulting (so different from bug 853369). Version-Release number of selected component (if applicable): libvirt-daemon-1.1.3-2.fc21.x86_64 qemu-1.4.2-11.fc19.x86_64 kernel-3.10.9-200.fc19.x86_64 (Will try updating to qemu from Rawhide shortly) How reproducible: Not reliably reproducible. Right now on my laptop it's happening 90% of the time, but usually it doesn't happen at all. Steps to Reproduce: 1. Run a virt tool such as virt-resize.
Some more random data points: If the machine is loaded with disk activity, then the bug doesn't happen. It seems like a race condition of some sort. Upgrading to qemu-1.6.0-10.fc21 does appear to have made the bug happen less often. I'm afraid I don't have a good reproducer for this. It may be connected with ./configure --enable-valgrind-daemon which is a debugging option that changes the order of shutdown: in production builds we always rely on libvirt actively killing qemu, but when --enable-valgrind-daemon is used, the appliance can shut itself down. Production builds would never have this option enabled. For reference the command I'm actually using to reproduce this locally is: LIBGUESTFS_DEBUG=1 ./run ./builder/website/test-guest.sh fedora-18
(In reply to Richard W.M. Jones from comment #0) > However, and this is strange: if I add a sleep to the guest > so it doesn't shut down immediately, eg. 'sleep 30', then > virDomainDestroyFlags will hang for 30 seconds, and *then* > give the same error as above. Note: This part is NOT strange. The hang here was in libguestfs. Just ignore this paragraph in the bug description.
On the surface this doesn't really look like a bug. If the guest is not running when virDomainDestroyFlags is called, then getting back this error code is expected. So the real question here is why QEMU is exited before libguestfs expected it to.
Can you capture a trace of libvirtd with the following log settings LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup" while triggering the 'virDomainDestroyFlags' API, and also provide the corresponding /var/log/libvirt/qemu/$GUEST.log. The timestamps between the two may let us identify the sequencing
Unfortunately, the overhead of debugging makes the bug go away ... Here is the script I'm using: ------------------- vfile=/tmp/libvirt.log gfile=/tmp/guestfs.log rm -f $vfile $gfile dir=$HOME/d/libguestfs export LIBVIRT_DEBUG=1 export LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup 1:file:$vfile" export LIBGUESTFS_DEBUG=1 export LIBGUESTFS_TRACE=1 $dir/run $dir/builder/virt-builder \ fedora-19 --output /tmp/fedora-19.img --size 10G |& tee $gfile ls -l $vfile $gfile ------------------- Why does that script never write to libvirt.log? (In reply to Daniel Berrange from comment #3) > On the surface this doesn't really look like a bug. If the guest is not > running when virDomainDestroyFlags is called, then getting back this error > code is expected. So the real question here is why QEMU is exited before > libguestfs expected it to. As I mentioned on IRC: (1) We need to find out if qemu segfaulted during shutdown. That's the reason for the graceful flag: https://bugzilla.redhat.com/show_bug.cgi?id=853369#c12 (2) While it may be true that currently virDomainDestroyFlags acts like you've described, it's not useful behaviour. What we really want is more like how Unix kill + waitpid works, ie. you can kill a process and wait for its exit status, and that works even if the process exits itself before or between the two system calls.
My bad, I gave the wrong env variable name LIBVIRT_LOG_FILTERS="1:qemu 1:command 1:security 1:process 1:cgroup" LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt/libvirtd.log"
For some reason this bug has started happening again. libvirt-1.2.2-1.fc21.x86_64 qemu-1.7.0-5.fc21.x86_64 I'll see if I can collect some debug information this time ...
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle. Changing version to '22'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
Haven't heard much on this bug for a while, so assuming it's gone. If anyone is still hitting this, please reopen