Bug 1325468 (autostart-w-engine)
Summary: | [RFE] Autostart of VMs that are down (with Engine assistance - Engine has to be up) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | biholcomb | |
Component: | ovirt-engine | Assignee: | Andrej Krejcir <akrejcir> | |
Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 4.2.0 | CC: | achareka, akrejcir, b.bellec, bugs, bugzilla-redhat, crimson, dfediuck, dmc, fdelorey, klaas, linux, mavital, mgoldboi, michal.skrivanek, mjankula, mkalinin, msivak, mtessun, pelauter, pvilayat, rhodain, riehecky, s.danzi, s.kieske, solarflow99, warlord | |
Target Milestone: | ovirt-4.4.0 | Keywords: | FutureFeature, Improvement | |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
|
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Enhancement | ||
Doc Text: |
After a high-availability virtual machine (HA VM) crashes, the RHV Manager tries to restart it indefinitely. At first, with a short delay between restarts. After a specified number of failed retries, the delay is longer.
Also, the Manager starts crashed HA VMs in order of priority, delaying lower-priority VMs until higher-priority VMs are 'Up.'
The current release adds new configuration options:
* `RetryToRunAutoStartVmShortIntervalInSeconds`, the short delay, in seconds. The default value is `30`.
* `RetryToRunAutoStartVmLongIntervalInSeconds`, the long delay, in seconds. The default value is `1800`, which equals 30 minutes.
* `NumOfTriesToRunFailedAutoStartVmInShortIntervals`, the number of restart tries with short delays before switching to long delays. The default value is `10` tries.
* `MaxTimeAutoStartBlockedOnPriority`, the maximum time, in minutes, before starting a lower-priority VM. The default value is `10` minutes.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1607510 (view as bug list) | Environment: | ||
Last Closed: | 2020-08-04 13:16:05 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1801439 | |||
Bug Blocks: | 1607510, 1670339 |
Description
biholcomb
2016-04-09 00:44:39 UTC
Bug #1166657 is similar. Another case explain ed in ml: "Pavel Gashev I'd like to see the autostart feature as well. In my case I need to autostart a virtual router VM at remote site. The issue is that oVirt can't see the remote host until the virtual router is started on this host. So HA is not an option." ovirt has an advanced HA system that start the Self Hosted Engine VM. A solution could be extend this HA to start others VM if engine is down or not started yet. (In reply to HWSD from comment #1) > ovirt has an advanced HA system that start the Self Hosted Engine VM. > A solution could be extend this HA to start others VM if engine is down or > not started yet. Or a similar method that, once the engine is started, will ensure that additional VMs are started, and if necessary start them in the correct order. The way VMware does it is that the UI lets you move a VM up and down in a list among three catgories. The 1st category is "start VMs in order", and you move a VM up or down in that list to manually control the order. The hypervisor will start the VMs and wait a specified period of time between (or possibly wait for the VM to come online). The 2nd category is an unordered start, so it's effectively a checkbox to auto-start the VM. The last category is a non-autostart list, which is effectively the checkbox unchecked. It would be nice if oVirt had a similar feature: * first, a checkbox to autostart a VM. This effectively gets us categories 2 and 3. * second, the ability to specify a start order on a select number of VMs. This would be VMware category 1. For most cases I think the engine can control this process. However there may be cases (e.g. the virtual router) where a host-specific local VM may need to be started asynchronously from the engine. For my particular case I'm just looking at a single-host/node ovirt system so everything is local/locally-hosted. This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?. I am also surprised this feature is missing. I don't think the engine should control this however, because I want the VMs to autostart with the host regardless if the engine is online or not. This is how VMware does it, and this would be especially true for standalone hosts. Having to manually start each and every one of your VMs manually is just silly. Duplicate of bug 817363 ? (In reply to Yaniv Kaul from comment #5) > Duplicate of bug 817363 ? Nobody outside RH can tell, this bug is restricted. I cannot view it. (In reply to Yaniv Kaul from comment #5) > Duplicate of bug 817363 ? This bug takes into account also the ordering of starting up of the VMs which seems to be an important aspect. I guess we can use this one since it's more comprehensive. I think we can limit the initial implementation to 2 areas: -autostart: will be depended on pin to host (later can be extended with sanlock HA) -ordering: additional factor to be added to VM<->VM affinity The basic requirements should be: preconditions: Autostrat: -VM is pinned to single host Ordering: -VM is part of affinity group (ordering/dependency should be added to affinity) UI/API: Autostart: VM flag which is enabled only if VM is pinned to host Ordering: should be part of affinity groups and relevant for VM to VM affinity. Moran and Yaniv, can we please use this bug to track only the "autostart WITH engine" case? The mentioned #817363 (which needs an upstream clone it seems) tracks the "autostart W/O engine" RFE. comment #4 can be handled using hosted engine, first the engine is started by the hosted engine tooling and then the engine starts the HA+autostart VMs. I see a need for the hosted-engine autostart described in comment #9. I had a situation last night where my primary oVirt cluster was hard-shutdown (fire in the building, fire department killed all power for safety). When it came back, only the engine started, so I had to start around 80 VMs individually. We have several old VMs (e.g. application servers that were on CentOS 5 and new CentOS 7 VMs had been built but the old VMs were kept around powered off for reference), so I had to know "these VMs should be up". My ideal world would be that, at least optionally, the engine would keep track of which VMs were up, and in the event of a full unclean shutdown (power removed), the engine would attempt to start them after the cluster+engine was back up. Ordering control would be better (like I needed to bring up DNS and database VMs first), but at least some method to start all the VMs that were running when the system failed would be good. (In reply to Derek Atkins from comment #2) > (In reply to HWSD from comment #1) > > ovirt has an advanced HA system that start the Self Hosted Engine VM. > > A solution could be extend this HA to start others VM if engine is down or > > not started yet. > > Or a similar method that, once the engine is started, will ensure that > additional VMs are started, and if necessary start them in the correct order. > > The way VMware does it is that the UI lets you move a VM up and down in a > list among three catgories. The 1st category is "start VMs in order", and > you move a VM up or down in that list to manually control the order. The > hypervisor will start the VMs and wait a specified period of time between > (or possibly wait for the VM to come online). The 2nd category is an > unordered start, so it's effectively a checkbox to auto-start the VM. The > last category is a non-autostart list, which is effectively the checkbox > unchecked. > > It would be nice if oVirt had a similar feature: > * first, a checkbox to autostart a VM. This effectively gets us categories > 2 and 3. > * second, the ability to specify a start order on a select number of VMs. > This would be VMware category 1. > > For most cases I think the engine can control this process. However there > may be cases (e.g. the virtual router) where a host-specific local VM may > need to be started asynchronously from the engine. > > For my particular case I'm just looking at a single-host/node ovirt system > so everything is local/locally-hosted. 100% this, having all VMs start while nice, doesn't help in situations like wanting to make sure puppet is, then start DBs before web front ends, etc. Also comment #9 supports this more. I'm not adding more information simple adding to the "This would be amazing" (In reply to Chris Adams from comment #10) btw if you just need to make sure a set of VMs is up and running you can just write a script using REST API to check they are up every few minutes (or just a single check, like for the # of running VMs) and start them when needed Or use High Availability on those VMs. (In reply to Michal Skrivanek from comment #12) > Or use High Availability on those VMs. Except that HA does not work on a single-host, hosted-engine ovirt system (because there is no way to turn on HA in that situation -- or at least as of 4.0.x there was no way to turn it on; has that changed?). So right now a script is the only solution, but a script cannot be managed through the ovirt engine UI. (In reply to Derek Atkins from comment #13) > (In reply to Michal Skrivanek from comment #12) > > > Or use High Availability on those VMs. > > Except that HA does not work on a single-host, hosted-engine ovirt system > (because there is no way to turn on HA in that situation -- or at least as > of 4.0.x there was no way to turn it on; has that changed?). HA can be enabled in 4.2 (since https://gerrit.ovirt.org/#/c/82014/), and should restart the VM on crash even when you have just a single host. It may help, though it is possible it's still not going to work as intended when the system boots up. Would be great if you can try it out. Correct. This bug only covers the WITH engine available cases which is useful in combination with hosted engine. The case of no engine is tracked in bug 817363. current design is to piggy back to HA VM functionality, all non-running HA VMs with lease terminated improperly(without a clean shut down) will be started upon engine startup. Without any particular order *** Bug 1166657 has been marked as a duplicate of this bug. *** *** Bug 1108678 has been marked as a duplicate of this bug. *** *** Bug 1269908 has been marked as a duplicate of this bug. *** *** Bug 1607510 has been marked as a duplicate of this bug. *** WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops tested on http://bob-dr.lab.eng.brq.redhat.com/builds/4.4/rhv-4.4.0-18 according to the attached polarion cases. please look at the case with the host storage blocking where HA Vms are restarted according to the Resume Behavior. In this case, sometimes the low_priority VMs are restarted before the medium. I also see this behavior in another scenario (sent in email): The test: I have four HA VMs with high priority named high_1, high_2, high_3, high_4; four HA VMs with medium priority named medium_1, medium_2, medium_3, medium_4; four HA VMs with low priority named low_priority_1, low_priority_2, low_priority_3, low_priority_4. Running on the same host1 (two other hosts are in maintenance). Send poweroff with the powermgmnt to the host , wait for a while then start the host again. The VMs are started, but sometimes the medium goes before the low (it is never messed with the high - all the high priority VMs are started first). [root@compute-ge-6 ovirt-engine]# tail -f engine.log |grep "Trying to restart VM" 2020-02-09 15:27:21,080+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [2a9cacc7] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-09 15:27:21,480+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [79c46e1e] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-09 15:27:22,060+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [2ff57f0a] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-09 15:28:31,578+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-09 15:28:31,597+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-09 15:28:31,615+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-09 15:28:32,517+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83) [63b1aa62] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-09 15:29:46,857+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-09 15:29:47,824+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-27) [55b821aa] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-09 15:31:02,135+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-30) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-09 15:31:03,050+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-52) [3b3fa31b] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-09 15:32:17,504+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-09 15:32:18,331+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [30747637] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-09 15:33:32,776+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-09 15:33:33,703+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [3c6aa9b2] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_1 on Host host_mixed_1 2020-02-09 15:33:33,965+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [3a4b4b22] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_2 on Host host_mixed_1 2020-02-09 15:33:34,299+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [4ff0c879] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_3 on Host host_mixed_1 2020-02-09 15:33:34,559+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [a1bfcac] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_3 on Host host_mixed_1 2020-02-09 15:33:34,828+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [4f5ef5cf] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_4 on Host host_mixed_1 2020-02-09 15:34:33,908+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-7) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_1 on Host host_mixed_1 2020-02-09 15:34:49,009+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_4 on Host host_mixed_1 2020-02-09 15:34:49,026+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_3 on Host host_mixed_1 2020-02-09 15:34:49,039+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_2 on Host host_mixed_1 2020-02-09 15:34:49,062+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_3 on Host host_mixed_1 I re-assign the bz for your investigation. re-tested as described in https://bugzilla.redhat.com/show_bug.cgi?id=1801439#c6. The behavior is correct Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247 |