Description of problem: Host initialization does not seem to check if all required networks are configured and up, so the host can briefly switch to UP without proper networking configured or all interfaces up. Then host monitoring finds the missing/bad network and moves it to Non-Op. And in conjuction with auto-recovery, this causes the host to loop through these states: Non Operational -> Up -> Non Operational -> Up ... It can become annoying and produce a lot of events. I'm also afraid the small window that the host is incorrectly set to Up could also cause further issues. See in more details: 1. Interface is down, host moved to Non-Op 2020-06-01 12:16:19,143+10 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-33) [] Host 'host2.kvm' moved to Non-Operational state because interface/s which are down are needed by required network/s in the current cluster: 'eth2 (storage-B)' 2020-06-01 12:16:19,317+10 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-33) [6faa2e26] EVENT_ID: VDS_SET_NONOPERATIONAL(517), Host host2.kvm moved to Non-Operational state. 2. Autorecovery 2020-06-01 12:20:00,057+10 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [] Autorecovering hosts id: ff49f88c-a98d-4aa5-9fff-831bd0b80b5d , name : host2.kvm 3. Switches to Up 2020-06-01 12:20:04,874+10 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-90) [3647a97a] EVENT_ID: VDS_DETECTED(13), Status of host host2.kvm was set to Up. 4. Host monitoring picks it up again 2020-06-01 12:21:10,840+10 INFO [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-25) [] Host 'host2.kvm' moved to Non-Operational state because interface/s which are down are needed by required network/s in the current cluster: 'eth2 (storage-B)' 5. GOTO 2 Version-Release number of selected component (if applicable): rhvm-4.3.9.4-11.el7.noarch (customer) rhvm-4.4.0-0.34.master.el8ev.noarch (labs) How reproducible: Always Steps to Reproduce: 1. Configure a required network on a Host 2. Pull the cable of the interface OR -if on nested KVM- $ virsh domif-getlink host2 vnet9 Actual results: Host status loops: Up -> Non Operational -> Up -> Non Operational Expected results: Non Operational
The bug has failedQA on rhvm-4.4.3.3-0.19.el8ev.noarch and vdsm-4.40.30-1.el8ev.x86_64 Scenario 1 working fine - Detach a required network from a host - the host not entering the status loop and remain in non-operational the whole time as expected. Scenario 2 - which is also reported here in the bug - link down an interface that has a required network attached to it - FAILED - it is still behave the same as before. The host changing state from non-operational and UP in a loop. The issue is not fixed.
Verified on - 4.4.3.8-0.1.el8ev and vdsm-4.40.35-1.el8ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5179