Bug 1384007
Summary: | The lifecycle event for Guest OS Shutdown is not distinguishable from a qemu process that was quit with SIG_TERM | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Vinzenz Feenstra [evilissimo] <vfeenstr> | |
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> | |
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 7.2 | CC: | dyuan, eblake, fromani, jsuchane, lcheng, michal.skrivanek, mkletzan, mzhan, nsoffer, pkrempa, rbalakri, vfeenstr, xuzhang, yafu, yanqzhan | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-3.2.0-7.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1418927 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-01 17:16:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1418927 | |||
Bug Blocks: |
Description
Vinzenz Feenstra [evilissimo]
2016-10-12 10:50:49 UTC
Just a quick test for the two cases shows: On 'virsh shutdown domain' QEMU sends us: SHUTDOWN STOP SHUTDOWN whether on 'kill $domain_pid' we get: SHUTDOWN So there are two problems. First is that the difference comes *after* the point where we consider the domain to be shut down (and we have already emitted the libvirt event). More importantly, this is not guaranteed to happen (I just tried it once) and none of the differences (the stop or the second shutdown) make sense after SHUTDOWN was recieved. The only way to differentiate this would be by adding information to the event in QEMU. So I'm cloning this BZ to QEMU for the requested support. In the meantime, if you really need to somehow at least try to distinguish clean and non-clean shutdown, you can, as a workaround, check for qemu agent disconnection event. For SIGTERM you should not get it. This is important for RHV when using using a vm lease: https://www.ovirt.org/develop/release-management/features/storage/vm-leases/ If sanlock cannot review the lease and terminate the vm, libvirt should return abnormal termination. Otherwise, RHV HA mechanism will treat the event as normal termination within the guest, and will not start the vm on another host. If sanlock cannot terminate qemu with SIGTERM, it will send SIGKILL - in this case libvirt cannot depend on anything from qemu. Libvirt can use output from qeme to detect normal shutdown, and if there is no info from qemu, treat it as abnormal termination. Martin, do you think we can track this issue in this bug, or we should open a new bug focused on the sanlock use case? (In reply to Nir Soffer from comment #4) What you describe is slightly different. It is of course possible to distinguish between SIGKILL'd and SIGTERM'd QEMU, that's not a problem. If libvirt doesn't get any info from the VM and the process disappears, then the state reason is "crashed" instead of "shutdown". What this BZ is about is that it is not distinguishable between SIGTERM and normal shutdown. For both of these events we get the same information from qemu, so it should be @Vincenzo: I am wondering if, as a workaround or just for thought, it would be enough to use your guest agent that would send you info when it's being stopped as that would be a normal shutdown for you and no info would be killed process. This is what we do as a temporary workaround in the long run I am expecting however libvirt and qemu to report this difference in someway. (In reply to Martin Kletzander from comment #5) > @Vincenzo: I am wondering if, as a workaround or just for thought, it would > be enough to use your guest agent that would send you info when it's being > stopped as that would be a normal shutdown for you and no info would be > killed process. it's not a great workaround as it is a simple system service which can stop and restart any time Sure, I get that you need better solution, but the fact that my idea works just proves that I understood exactly what you need. Thanks a lot. Depending on upstream reaction, qemu 2.10 may start advertising when a signal was the cause of a SHUTDOWN event: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01098.html Preliminary patch for libvirt posted here: https://www.redhat.com/archives/libvir-list/2017-April/msg00622.html Another version posted upstream: https://www.redhat.com/archives/libvir-list/2017-May/msg00540.html Reproduced with libvirt-2.0.0-10.el7_3.9.x86_64, qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 Steps: 1.# virsh shutdown rhel7.3 Domain rhel7.3 is being shutdown # virsh event --all --loop # event 'lifecycle' for domain rhel7.3: Shutdown Finished event 'lifecycle' for domain rhel7.3: Stopped Shutdown 2.# kill -15 $PID # virsh event --all --loop event 'lifecycle' for domain rhel7.3: Shutdown Finished event 'lifecycle' for domain rhel7.3: Stopped Shutdown Verify on rhel7.4 with libvirt-3.2.0-10.el7.x86_64, qemu-kvm-rhev-2.9.0-10.el7.x86_64 Steps: 1.# virsh shutdown V (or in guest: shutdown -h; poweroff ) # virsh event V --loop --all event 'lifecycle' for domain V: Shutdown Finished after guest request event 'lifecycle' for domain V: Stopped Shutdown 2.# kill -1/2/15 $PID # virsh event V --loop --all event 'lifecycle' for domain V: Shutdown Finished after host request event 'lifecycle' for domain V: Stopped Shutdown mark as verified per comment 19. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |