Created attachment 1662275 [details] engine.log attached Description of problem: Sometimes prioritization of restarted HA VMs is messed up -low is restarted before medium Version-Release number of selected component (if applicable): Sometimes prioritization of restarted HA VMs is messed up -low is restarted before the medium How reproducible:always Steps to Reproduce: Scenario 1. 1. Run the following HA VMs with lease on the same host four HA VMs with high priority named high_1, high_2, high_3, high_4; four HA VMs with medium priority named medium_1, medium_2, medium_3, medium_4; four HA VMs with low priority named low_priority_1, low_priority_2, low_priority_3, low_priority_4. 2. Send poweroff with the powermgmnt to the host (VMs are down which is ok), wait for a while then start the host again. Scenario 2. 1. the same as in scenario 1 2. block the storage on the host with iptables command Expected results: VMs are restarted in the order: high priority, medium, low. Actual results: The VMs are started, but sometimes the medium goes before the low (it is never messed with the high - all the high priority VMs are started first). [root@compute-ge-6 ovirt-engine]# tail -f engine.log |grep "Trying to restart VM" 2020-02-09 15:27:21,080+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [2a9cacc7] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-09 15:27:21,480+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [79c46e1e] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-09 15:27:22,060+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [2ff57f0a] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-09 15:28:31,578+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-09 15:28:31,597+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-09 15:28:31,615+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-47) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-09 15:28:32,517+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83) [63b1aa62] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-09 15:29:46,857+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-09 15:29:47,824+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-27) [55b821aa] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-09 15:31:02,135+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-30) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-09 15:31:03,050+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-52) [3b3fa31b] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-09 15:32:17,504+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-55) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-09 15:32:18,331+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [30747637] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-09 15:33:32,776+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-09 15:33:33,703+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [3c6aa9b2] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_1 on Host host_mixed_1 2020-02-09 15:33:33,965+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [3a4b4b22] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_2 on Host host_mixed_1 2020-02-09 15:33:34,299+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [4ff0c879] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_3 on Host host_mixed_1 2020-02-09 15:33:34,559+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [a1bfcac] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_3 on Host host_mixed_1 2020-02-09 15:33:34,828+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [4f5ef5cf] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_4 on Host host_mixed_1 2020-02-09 15:34:33,908+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-7) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_1 on Host host_mixed_1 2020-02-09 15:34:49,009+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_4 on Host host_mixed_1 2020-02-09 15:34:49,026+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_3 on Host host_mixed_1 2020-02-09 15:34:49,039+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_2 on Host host_mixed_1 2020-02-09 15:34:49,062+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-91) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_3 on Host host_mixed_1 Additional info:
The lower priority HA VMs wait for the higher priority VMs to be Up, or until a configured timeout passes (config value 'MaxTimeAutoStartBlockedOnPriority'). By default the timeout is 5 minutes. Looking at the logs, it seems that the timeout is not long enough. For example VM 'low_priority_1' was added to the auto-start algorithm here: 2020-02-09 15:24:16,262+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-31) [] EVENT_ID: HA_VM_FAILED(9,602), Highly Available VM low_priority_1 failed. It will be restarted automatically. And it was started 8 minutes after, here: 2020-02-09 15:32:18,205+02 INFO [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [30747637] START, CreateVDSCommand( CreateVDSCommandParameters:{hostId='ccc22896-571f-45fc-94e1-b752cf0baf5e', vmId='4d601cb7-377f-4aed-baca-dc2f07b28543', vm='VM [low_priority_1]'}), log id: 7d3317a5 Can you test this again, but set the timeout to high enough value? Default timeout of 5 minutes may not be long enough, maybe we should increase it.
Hi Andrej, it looks like now we have 10 min, not 5 - engine-config --get MaxTimeAutoStartBlockedOnPriority MaxTimeAutoStartBlockedOnPriority: 600 version: general do we want to try longer timeout ?
Right, I was mistaken. But could you test it again with a longer timeout anyway? I cannot reproduce it in my environment, and then we will know if the issue is caused by the timeout or not. Try setting it to a high value, like 1 day.
Created attachment 1662756 [details] engine logs for two scenarios Tried two values - an hour and a day: engine-config --set MaxTimeAutoStartBlockedOnPriority=3600 (engine restart) engine-config --set MaxTimeAutoStartBlockedOnPriority=86400 (engine restart) still, low priority goes before medium. attached engine_log_two_scenarios_for_two_timeouts.tar.gz [root@compute-ge-6 ovirt-engine]# grep "Trying to restart VM" engine.log 2020-02-12 19:51:18,797+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-28) [318fc02d] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-12 19:51:19,325+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-28) [7d54fcdf] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-12 19:51:19,705+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-28) [5ed24e99] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-12 19:52:30,771+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_4 on Host host_mixed_1 2020-02-12 19:52:30,789+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_2 on Host host_mixed_1 2020-02-12 19:52:30,805+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_3 on Host host_mixed_1 2020-02-12 19:52:32,163+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-100) [c7d132] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-12 19:53:46,740+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-5) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM high_1 on Host host_mixed_1 2020-02-12 19:53:47,247+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [34de16e3] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-12 19:54:47,874+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-45) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_2 on Host host_mixed_1 2020-02-12 19:54:48,725+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [437c267c] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-12 19:56:03,634+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-18) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_4 on Host host_mixed_1 2020-02-12 19:56:04,971+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32) [2f589156] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-12 19:57:19,543+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-60) [] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_1 on Host host_mixed_1 2020-02-12 19:57:20,025+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [248a938a] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_1 on Host host_mixed_1 2020-02-12 19:57:20,392+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [3abacf4b] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_2 on Host host_mixed_1 2020-02-12 19:57:20,729+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [7341c6e8] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM medium_3 on Host host_mixed_1 2020-02-12 19:57:21,141+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [6a526936] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_3 on Host host_mixed_1 2020-02-12 19:57:21,506+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-24) [b90fd40] EVENT_ID: VDS_INITIATED_RUN_VM(506), Trying to restart VM low_priority_4 on Host host_mixed_1
Closed the bz as NOTABUG since there was an error in the test setup - two VMs named as a medium were configured with low priority which messed up the results. Re-tested for both scenarios (host poweroff and blocking storage). The results are correct