The customer just updated this: "The command executed pre-host boot: killall -TERM glusterd glusterfs glusterfsd Also now attached the sos report from the run with this command execute prior to the host reboot." I have copied the sosreports to supportshell Regards, Jay
verification flow taken from https://bugzilla.redhat.com/show_bug.cgi?id=1768168#c42: > 1. You need a setup with two hosts > 2. Create a VM with a lease and start it > 3. The host that does not run the VM should have > " 'acquired': domStatus.hasHostId is True, " changed to 'acquired': False, > at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py As it's RHEL8 with python3,please replace the python path with 3.6 > 4. Add a delay at getStats on the host that does not run the VM, add a 60 > sleep to /usr/lib/python2.7/site-packages/vdsm/API.py > at Global#getStats > 5. Hard reset the host the that is running the VM > 6. Make sure there was no attempt to run the VM on the restarted host and > that it was filtered out
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
The WARM message [1] doesn't appear in the engine.log in the 4.4.0-0.9 version. There are the steps to reproduce this bug (4.4 V): 1. You need a setup with two hosts 2. Create a VM with lease and start the VM 3. The host that doesn't run the VM should have " 'acquired': domStatus.hasHostId is True, " change this row to 'acquired': False, at /usr/lib/python3.6/site-packages/vdsm/storage/hsm.py 4. Add a delay at getStats on the host that run the VM, add a 60 sleep to /usr/lib/python3.6/site-packages/vdsm/API.py at Global#getStats 5. Make a reboot to the host that is running the VM 6. Make sure there isn't attempt to run the VM on the restarted host and on the other host The status of the VM should be Unknown, the fixing of the bug was to see if the WARN: "VM lease is not ready yet" appears in the logs. [1] 2019-11-17 14:30:20,874+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_2,$filterName VM leases ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL (This message is from the verification of 4.3) The WARN messages that appear : 2019-12-22 14:46:22,441+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] EVENT_ID: VDS_ALERT_NO_PM_CONFIG_FENCE_OPERATION_SKIPPED(9,028), Host host_mixed_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted" 2019-12-22 14:46:22,448+02 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] Trying to release exclusive lock which does not exist, lock key: '1b6761c2-336b-4a53-bd9b-6f6b64d9d377VDS_FENCE'
(In reply to Shir Fishbain from comment #45) > The WARM message [1] doesn't appear in the engine.log in the 4.4.0-0.9 > version. > There are the steps to reproduce this bug (4.4 V): > > 1. You need a setup with two hosts > 2. Create a VM with lease and start the VM > 3. The host that doesn't run the VM should have > " 'acquired': domStatus.hasHostId is True, " change this row to 'acquired': > False, > at /usr/lib/python3.6/site-packages/vdsm/storage/hsm.py > 4. Add a delay at getStats on the host that run the VM, add a 60 sleep to > /usr/lib/python3.6/site-packages/vdsm/API.py > at Global#getStats > 5. Make a reboot to the host that is running the VM > 6. Make sure there isn't attempt to run the VM on the restarted host and on > the other host > The status of the VM should be Unknown, the fixing of the bug was to see if > the WARN: "VM lease is not ready yet" appears in the logs. > > [1] 2019-11-17 14:30:20,874+02 WARN > [org.ovirt.engine.core.bll.RunVmCommand] > (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Validation of > action 'RunVm' failed for user SYSTEM. Reasons: > VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT, > VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_2,$filterName VM leases > ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST, > SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL > (This message is from the verification of 4.3) > > The WARN messages that appear : > 2019-12-22 14:46:22,441+02 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] EVENT_ID: > VDS_ALERT_NO_PM_CONFIG_FENCE_OPERATION_SKIPPED(9,028), Host host_mixed_2 > became non responsive. It has no power management configured. Please check > the host status, manually reboot it, and click "Confirm Host Has Been > Rebooted" > > 2019-12-22 14:46:22,448+02 WARN > [org.ovirt.engine.core.bll.lock.InMemoryLockManager] > (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] Trying to release > exclusive lock which does not exist, lock key: > '1b6761c2-336b-4a53-bd9b-6f6b64d9d377VDS_FENCE' The host should eventually be up, how long is it stuck?
Verified - The WARN message appears in engine.log: 2019-12-23 18:37:09,445+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (EE-ManagedThreadFactory-engine-Thread-4724) [2445e6b8] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_3,$filterName VM leases ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL ovirt-engine-4.4.0-0.13.master.el7.noarch vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Found non-acked flags: '{}', ] For more info please contact: rhv-devops
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3247