Bug 1020216 - libvirt fails to shut down domain: could not destroy libvirt domain: Requested operation is not valid: domain is not running
libvirt fails to shut down domain: could not destroy libvirt domain: Requeste...
Status: CLOSED DEFERRED
Product: Fedora
Classification: Fedora
Component: libvirt (Show other bugs)
22
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Libvirt Maintainers
Fedora Extras Quality Assurance
:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
  Show dependency treegraph
 
Reported: 2013-10-17 06:08 EDT by Richard W.M. Jones
Modified: 2016-04-26 10:12 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-21 18:09:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2013-10-17 06:08:59 EDT
Description of problem:

I get this error intermittently when calling virDomainDestroyFlags
with flags=VIR_DOMAIN_DESTROY_GRACEFUL.

Fatal error: exception Guestfs.Error("could not destroy libvirt domain: Requested operation is not valid: domain is not running [code=55 domain=10]")

The domain has possibly exited itself before we call
virDomainDestroyFlags.

However, and this is strange: if I add a sleep to the guest
so it doesn't shut down immediately, eg. 'sleep 30', then
virDomainDestroyFlags will hang for 30 seconds, and *then*
give the same error as above.

There are no errors in the qemu log file.

qemu does not appear to be segfaulting (so different from bug 853369).

Version-Release number of selected component (if applicable):

libvirt-daemon-1.1.3-2.fc21.x86_64
qemu-1.4.2-11.fc19.x86_64
kernel-3.10.9-200.fc19.x86_64

(Will try updating to qemu from Rawhide shortly)

How reproducible:

Not reliably reproducible.  Right now on my laptop it's happening
90% of the time, but usually it doesn't happen at all.

Steps to Reproduce:
1. Run a virt tool such as virt-resize.
Comment 1 Richard W.M. Jones 2013-10-17 07:23:27 EDT
Some more random data points:

If the machine is loaded with disk activity, then the bug doesn't
happen.  It seems like a race condition of some sort.

Upgrading to qemu-1.6.0-10.fc21 does appear to have made the bug
happen less often.

I'm afraid I don't have a good reproducer for this.  It may
be connected with ./configure --enable-valgrind-daemon which is
a debugging option that changes the order of shutdown: in production
builds we always rely on libvirt actively killing qemu, but when
--enable-valgrind-daemon is used, the appliance can shut itself
down.  Production builds would never have this option enabled.

For reference the command I'm actually using to reproduce this locally is:

LIBGUESTFS_DEBUG=1 ./run ./builder/website/test-guest.sh fedora-18
Comment 2 Richard W.M. Jones 2013-10-18 08:43:39 EDT
(In reply to Richard W.M. Jones from comment #0)
> However, and this is strange: if I add a sleep to the guest
> so it doesn't shut down immediately, eg. 'sleep 30', then
> virDomainDestroyFlags will hang for 30 seconds, and *then*
> give the same error as above.

Note: This part is NOT strange.  The hang here was in libguestfs.
Just ignore this paragraph in the bug description.
Comment 3 Daniel Berrange 2013-10-18 08:45:40 EDT
On the surface this doesn't really look like a bug. If the guest is not running when virDomainDestroyFlags is called, then getting back this error code is expected. So the real question here is why QEMU is exited before libguestfs expected it to.
Comment 4 Daniel Berrange 2013-10-18 08:49:11 EDT
Can you capture a trace of libvirtd with the following log settings

  LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup"

while triggering the 'virDomainDestroyFlags' API, and also provide the corresponding /var/log/libvirt/qemu/$GUEST.log.  The timestamps between the two may let us identify the sequencing
Comment 5 Richard W.M. Jones 2013-10-18 09:49:52 EDT
Unfortunately, the overhead of debugging makes the bug go away ...

Here is the script I'm using:

-------------------
vfile=/tmp/libvirt.log
gfile=/tmp/guestfs.log
rm -f $vfile $gfile
dir=$HOME/d/libguestfs

export LIBVIRT_DEBUG=1
export LIBVIRT_LOG_OUTPUTS="1:qemu 1:command 1:security 1:process 1:cgroup 1:file:$vfile"
export LIBGUESTFS_DEBUG=1
export LIBGUESTFS_TRACE=1

$dir/run $dir/builder/virt-builder \
  fedora-19 --output /tmp/fedora-19.img --size 10G |& tee $gfile

ls -l $vfile $gfile
-------------------

Why does that script never write to libvirt.log?

(In reply to Daniel Berrange from comment #3)
> On the surface this doesn't really look like a bug. If the guest is not
> running when virDomainDestroyFlags is called, then getting back this error
> code is expected. So the real question here is why QEMU is exited before
> libguestfs expected it to.

As I mentioned on IRC:

(1) We need to find out if qemu segfaulted during shutdown.
That's the reason for the graceful flag:
https://bugzilla.redhat.com/show_bug.cgi?id=853369#c12

(2) While it may be true that currently virDomainDestroyFlags acts
like you've described, it's not useful behaviour.  What we really
want is more like how Unix kill + waitpid works, ie. you can kill
a process and wait for its exit status, and that works even if the
process exits itself before or between the two system calls.
Comment 6 Daniel Berrange 2013-10-18 10:46:29 EDT
My bad, I gave the wrong env variable name

  LIBVIRT_LOG_FILTERS="1:qemu 1:command 1:security 1:process 1:cgroup"
  LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt/libvirtd.log"
Comment 7 Richard W.M. Jones 2014-03-06 12:39:19 EST
For some reason this bug has started happening again.

libvirt-1.2.2-1.fc21.x86_64
qemu-1.7.0-5.fc21.x86_64

I'll see if I can collect some debug information this time ...
Comment 8 Jaroslav Reznik 2015-03-03 10:08:42 EST
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
Comment 9 Cole Robinson 2015-09-21 18:09:03 EDT
Haven't heard much on this bug for a while, so assuming it's gone. If anyone is still hitting this, please reopen

Note You need to log in before you can comment on or make changes to this bug.