The customer just updated this: "The command executed pre-host boot: killall -TERM glusterd glusterfs glusterfsd Also now attached the sos report from the run with this command execute prior to the host reboot." I have copied the sosreports to supportshell Regards, Jay (Originally by Jaysamson Pankajakshan)
Hi Benny, Please provide a clear scenario so I can QA_ACK.
1. You need a setup with two hosts 2. Create a VM with a lease and start it 3. The host that does not run the VM should have " 'acquired': domStatus.hasHostId is True, " changed to 'acquired': False, at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py 4. Add a delay at getStats on the host that does not run the VM, add a 60 sleep to /usr/lib/python2.7/site-packages/vdsm/API.py at Global#getStats 5. Hard reset the host the that is running the VM 6. Make sure there was no attempt to run the VM on the restarted host and that it was filtered out
(In reply to Benny Zlotnik from comment #41) > 1. You need a setup with two hosts > 2. Create a VM with a lease and start it > 3. The host that does not run the VM should have > " 'acquired': domStatus.hasHostId is True, " changed to 'acquired': False, > at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py As it's RHEL8 with python3,please replace the python path with 3.6 > 4. Add a delay at getStats on the host that does not run the VM, add a 60 > sleep to /usr/lib/python2.7/site-packages/vdsm/API.py > at Global#getStats > 5. Hard reset the host the that is running the VM > 6. Make sure there was no attempt to run the VM on the restarted host and > that it was filtered out
(In reply to Avihai from comment #42) > (In reply to Benny Zlotnik from comment #41) > > 1. You need a setup with two hosts > > 2. Create a VM with a lease and start it > > 3. The host that does not run the VM should have > > " 'acquired': domStatus.hasHostId is True, " changed to 'acquired': False, > > at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py > As it's RHEL8 with python3,please replace the python path with 3.6 > > > 4. Add a delay at getStats on the host that does not run the VM, add a 60 > > sleep to /usr/lib/python2.7/site-packages/vdsm/API.py > > at Global#getStats > > 5. Hard reset the host the that is running the VM > > 6. Make sure there was no attempt to run the VM on the restarted host and > > that it was filtered out Disregard this comment as this is the verification flow for RHEL8/RHV4.4. Please use verification flow from comment 41. https://bugzilla.redhat.com/show_bug.cgi?id=1768168#c41
Created attachment 1636178 [details] Logs
After I have spoken with Benny, there are the steps to reproduce the bug : 1. You need a setup with two hosts 2. Create a VM with lease and start the VM 3. The host that doesn't run the VM should have " 'acquired': domStatus.hasHostId is True, " change this row to 'acquired': False, at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py 4. Add a delay at getStats on the host that run the VM, add a 60 sleep to /usr/lib/python2.7/site-packages/vdsm/API.py at Global#getStats 5. Make a reboot to the host that is running the VM 6. Make sure there isn't attempt to run the VM on the restarted host and on the other host The status of the VM should be Unknown, the fixing of the bug was to see if the WARN: "VM lease is not ready yet" appears in the logs. There are the rows from the engine log: 2019-11-17 14:29:53,680+02 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-68593) [69254924] EVENT_ID: VDS_ALERT_NO_PM_CONFIG_FENCE_OPERATION_SKIPPED(9,028), Host host_mixed_3 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted" 2019-11-17 14:29:53,688+02 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-68593) [69254924] Trying to release exclusive lock which does not exist, lock key: 'b2e12f99-392d-42b8-b3d8-8e371ccf8ce7VDS_FENCE' 2019-11-17 14:30:20,872+02 INFO [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Candidate host 'host_mixed_2' ('de79a846-13e5-4657-9449-a80efd46dc10') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'VM leases ready' (correlation id: null) 2019-11-17 14:30:20,874+02 WARN [org.ovirt.engine.core.bll.RunVmCommand] (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_2,$filterName VM leases ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL Benny, can you please ack that I hit the customer scenario?
Looks good!
Created attachment 1637026 [details] New_Logs
Verified ovirt-engine-4.3.7.2-0.1.el7.noarch vdsm-4.30.37-1.el7ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4229