Bug 1783815
Summary: | HA did not work, vm killed by sanlock recovery was reported as shutdown from within the guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
Component: | vdsm | Assignee: | Andrej Krejcir <akrejcir> |
Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.3.5 | CC: | ahadas, akrejcir, aperotti, dconsoli, jortialc, lsurette, mavital, mkalinin, nsoffer, pelauter, rdlugyhe, srevivo, tnisan, ycui |
Target Milestone: | ovirt-4.4.1 | Flags: | lsvaty:
testing_plan_complete-
|
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rhv-4.4.0-29 | Doc Type: | Bug Fix |
Doc Text: |
Previously, if a virtual machine (VM) was forcibly shut down by SIGTERM, in some cases the VDSM did not handle the libvirt shutdown event that contained information about why the VM was shut down and evaluated it as if the guest had initiated a clean shutdown. The current release fixes this issue: VDSM handles the shutdown event, and the Manager restarts the high-availability VMs as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-04 13:27:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1789090 |
Description
Germano Veit Michel
2019-12-16 01:07:46 UTC
Nir, does this falls within Storage realm or Virt (fencing)? (In reply to Tal Nisan from comment #5) > Nir, does this falls within Storage realm or Virt (fencing)? This is vm life-cycle, so virt. Storage give the fencing capability killing the vm. Libvirt should report this vdsm as vm being killed, and vdsm should report the event to engine, so it can restart the vm on another host. Because vdsm was also terminated by sanlock, it may miss events from libvirt. in this case it needs to be able to get the status of vms after vdsm starts up, and report the status to engine. I'd ask why the host wasn't fenced when it lost a connection to storage (either fencing from infra or storage fencing), but for the HA part, we should behave better. Andrej, any thoughts? When the qemu process receives SIGTERM, it shuts down cleanly, but the guest OS does not. Vdsm and the engine interpret this as shutdown from the guest, so HA VM is not restarted. I think it would be better if HA VMs were restarted in this case. I will look if there is a way to distinguish between a shutdown from the guset OS and qemu receiving SIGTERM. The 'Shutdown' event from libvirt has a parameter that specifies if the VM was shut down from guest or host. https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainEventShutdownDetailType It was implemented in Bug 1384007. Currently, VDSM ignores this event. verified on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-31 1. Create four HA VMs: Server with/without lease, Desktop with/without lease and run them on the same host. 2. enter the host and send kill to vdsm PID and to four qemu IDs. [root@ocelot02 ~]# systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled) Active: activating (auto-restart) since Wed 2020-04-22 14:48:38 IDT; 905ms ago Process: 1023401 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsmd (code=exited, status=0/SUCCESS) Process: 1023333 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 1023401 (code=exited, status=0/SUCCESS) Result: all are restarted Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |