Bug 1149135
| Summary: | Prestarted VMs dissapear from UI after failure to restore snapshot once VM turns from Unknown status to Down | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Tomas Dosek <tdosek> | |
| Component: | ovirt-engine | Assignee: | Arik <ahadas> | |
| Status: | CLOSED ERRATA | QA Contact: | Ilanit Stein <istein> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.4.0 | CC: | adahms, ahoness, bkorren, ecohen, eedri, iheim, lpeer, lsurette, mavital, ofrenkel, pablo.iranzo, rbalakri, Rhev-m-bugs, scohen, tdosek, yeylon | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | 3.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | virt | |||
| Fixed In Version: | org.ovirt.engine-root-3.5.0-19 | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, updating a virtual machine from a pool that was set to use the latest version of the template on which the pool is based would sometimes fail. This resulted in virtual machines that could not be updated to the latest version of the template being removed from the pool. Now, the version of the template for virtual machines in pools has been corrected so that virtual machines are no longer removed from pools under these circumstances.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1155557 (view as bug list) | Environment: | ||
| Last Closed: | 2015-02-11 18:09:27 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1133612, 1155557 | |||
|
Description
Tomas Dosek
2014-10-03 10:14:56 UTC
There was a simple test made. This looks like an automatic behavior of some kind. If the VMs are marked as protected against deletion, they end up being detached from pools. first, I'll describe the flow: 1. host that contained prestarted VM was rebooted 2. when the host was up again, the monitoring detected that all the VMs that ran on it don't exist 3. the mechanism for 'cleaning up' VMs that went down can be invoked from 2 places: a. from the monitoring: the monitoring detects that VM crashed and invoke ProcessVmOnDownCommand b. from run command: when trying to run VM we invoke ProcessVmOnDownCommand if the VM already has stateless snapshot in this case ProcessVmOnDownCommand was called by the second flow (b) 4. there is a bug that when ProcessVmOnDownCommand is called from within RunVmcommand, the version is updated even when it is not supposed to be updated (as it was in this case, there was no sub version for the template which the pool is based on). as part of the version update (UpdateVmVersionCommand), the VM is removed and added again - but when ProcessVmOnDownCommand is called from RunVmCommand the VM is added when it is based on the BLANK template instead of the correct template (another bug). 5. the removal of the VM succeed but the add command (AddVmAndAttachToPoolCommand) fails because of invalid timezone in the BLANK template: 2014-10-02 12:24:39,176 WARN [org.ovirt.engine.core.bll.AddVmAndAttachToPoolCommand] (org.ovirt.thread.pool-4-thread-55) [2d0d06c5] CanDoAction of action AddVmAndAttachToPool failed. Reasons:VAR__ACTION__ADD,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_TIMEZONE --------------------------------------------------------------------------- * the problem with the invalid timezone which prevents adding the updated VM was already solved by bz 1087917, so in 3.5 the VM won't disappear * part of the problem is that ProcessVmOnDownCommand was called by RunVmCommand instead of the monitoring. that happens because of the frequency that the prestarted-monitoring is invoked (VmPoolMonitor). the default setting is 5 minutes but I see in the logs that sometimes it is invoked 3-4 times in a minute for several pools. Tomas, I don't understand - VmPoolMonitorIntervalInMinutes is in minutes so how is the frequency so high? if it is set to something like 0.3 than it is like setting it to 0 which means it is invoked in a loop - and that's bad. maybe we should set it to mili-seconds if that's the case. I'll make a fix for the case where ProcessVmOnDown is called from RunVmCommand and that should solve the problem. With regards to the first issue, we have a patch so I could build a testpackage, but I prefer waiting for the second one to be sure we cover both scenarios. Tested on vt9: Steps to reproduce: 1. Have a pool of 10 pre-started VMs 2. Reboot hypervisor they run on These steps were run 3 times. Results: Following the host reboot, the VMs in the pool go to Unknown status. After the host became up, all vms became up. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html |