Description of problem: When trying to shut down a transient domain which *is* running I get: *stdin*:31: libguestfs: error: could not destroy libvirt domain: Requested operation is not valid: domain is not running [code=55 domain=10] This used to work in libvirt-0.10.0-0rc0.2.fc18.x86_64 but seems to have broken in Rawhide (libvirt-0.10.0-1.fc19.x86_64). Version-Release number of selected component (if applicable): libvirt-0.10.0-1.fc19.x86_64 How reproducible: At least once. Steps to Reproduce: 1. Build libguestfs in Rawhide. Actual results: http://kojipkgs.fedoraproject.org//work/tasks/8635/4438635/build.log
Second attempt at building also fails the same way: http://kojipkgs.fedoraproject.org//work/tasks/931/4440931/build.log As requested I'll try to get libvirt logs of this.
I couldn't reproduce the problem on top of libvirt git.
(In reply to comment #2) > I couldn't reproduce the problem on top of libvirt git. No libguestfs building surely, :-) Just trying to destroy a transient domain.
I get a similar but different error when running this on my local machine: libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 1a | 00 00 00 01 | 00 12 34 04 | ... libguestfs: error: could not destroy libvirt domain: End of file while reading data: Input/output error [code=38 domain=7] libguestfs-test-tool: shutdown failed libguestfs: closing guestfs handle 0x665f80 (state 0)
Created attachment 608486 [details] libvirt.log Actually when running locally, I get both errors. Attached is the libvirt log requested. (In reply to comment #3) > (In reply to comment #2) > > I couldn't reproduce the problem on top of libvirt git. > > No libguestfs building surely, :-) Just trying to destroy a transient domain. What did you do to try to reproduce this? libguestfs is a big C program and it creates and destroys the transient guest entirely through the API: https://github.com/libguestfs/libguestfs/blob/87cb1549761c9441b0fa7ee9b6a85b8eeb164c5c/src/launch-libvirt.c I'm pretty sure it's not libguestfs at fault here since (a) it works fine with other libvirt and (b) its use of the API is very simple.
Created attachment 608490 [details] libvirtd.log (log file from daemon)
Looking at it closer, I think what's happening is that qemu segfaults when libvirt sends it a signal to shut down. (That's a bug in qemu obviously). But then libvirt ought to be able to distinguish this case -- we really care if qemu segfaults, but it could indicate data integrity issues. I will try and catch the qemu segfault if I can.
(In reply to comment #7) > qemu segfaults, but it could indicate data integrity issues. s/but/because/
(In reply to comment #5) > Created attachment 608486 [details] > libvirt.log > > Actually when running locally, I get both errors. > > Attached is the libvirt log requested. > > (In reply to comment #3) > > (In reply to comment #2) > > > I couldn't reproduce the problem on top of libvirt git. > > > > No libguestfs building surely, :-) Just trying to destroy a transient domain. > > What did you do to try to reproduce this? libguestfs is a big > C program and it creates and destroys the transient guest > entirely through the API: I simply used virsh to destroy a transient domain. > https://github.com/libguestfs/libguestfs/blob/ > 87cb1549761c9441b0fa7ee9b6a85b8eeb164c5c/src/launch-libvirt.c > I'm pretty sure it's not libguestfs at fault here since > (a) it works fine with other libvirt and (b) its use of the > API is very simple.
So I've verified that what is happening is that qemu is segfaulting when libvirtd sends it a signal (new bug 853408). But definitely libvirt could improve the error message here. It's a good thing that libvirt indicates some sort of error, because we really want to know when this fails, but it should say something like 'qemu just segfaulted'.
From the POV of the virDomainDestroy() command, whether QEMU segfaults or shuts down cleanly is academic, since this command makes no guarantees about how QEMU is stopped, and indeed will even send SIGKILL to QEMU which arguably has similar effect to SEGV. So having QEMU SEGV after sending it a SIGTERM should be considered 'Success' for this function. As such we should not be returning the "Operation is not valid' error code.
13:07 <@rwmjones> danpb: what should I be using if I care about whether qemu shuts down without segfaulting? 13:09 < danpb> oh, pass the GRACEFUL flag to virDomainDestroy 13:09 < danpb> that means we'll only ever ask qemu to do a clean shutdown, and never try to SIGKILL it 13:10 < danpb> if we pass that flag, then you are right that we should report the SEGV as an error condition for virDomainDestory
(In reply to comment #12) > 13:07 <@rwmjones> danpb: what should I be using if I care about whether qemu > shuts down without segfaulting? > 13:09 < danpb> oh, pass the GRACEFUL flag to virDomainDestroy > 13:09 < danpb> that means we'll only ever ask qemu to do a clean shutdown, > and never try to SIGKILL it > 13:10 < danpb> if we pass that flag, then you are right that we should > report the SEGV as an error condition for virDomainDestory I have fixed this in libguestfs 1.19.39.
Closing / upstream based on comment 13.