Bug 1741625 - VM fails to be re-started with error: Failed to acquire lock: No space left on device
Summary: VM fails to be re-started with error: Failed to acquire lock: No space left o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.5
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.0
: ---
Assignee: Benny Zlotnik
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On:
Blocks: 1768168
TreeView+ depends on / blocked
 
Reported: 2019-08-15 15:51 UTC by Jay Samson
Modified: 2020-08-04 13:20 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1768168 (view as bug list)
Environment:
Last Closed: 2020-08-04 13:20:00 UTC
oVirt Team: Storage
Target Upstream Version:
kdhananj: needinfo-
kdhananj: needinfo-
bzlotnik: needinfo-
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:3247 0 None None None 2020-08-04 13:20:34 UTC
oVirt gerrit 103822 0 'None' MERGED core: clear domains cache when changing state 2021-02-05 06:07:14 UTC
oVirt gerrit 104355 0 'None' MERGED core: clear domains cache when changing state 2021-02-05 06:07:14 UTC

Comment 28 Jay Samson 2019-10-01 09:20:35 UTC
The customer just updated this:

"The command executed pre-host boot:

killall -TERM glusterd glusterfs glusterfsd

Also now attached the sos report from the run with this command execute prior to the host reboot."

I have copied the sosreports to supportshell

Regards,
Jay

Comment 41 Avihai 2019-11-11 15:12:41 UTC
verification flow taken from https://bugzilla.redhat.com/show_bug.cgi?id=1768168#c42:

> 1. You need a setup with two hosts
> 2. Create a VM with a lease and start it
> 3. The host that does not run the VM should have
> " 'acquired': domStatus.hasHostId is True, " changed to  'acquired': False,
> at /usr/lib/python2.7/site-packages/vdsm/storage/hsm.py 
As it's RHEL8 with python3,please replace the python path with 3.6

> 4. Add a delay at getStats on the host that does not run the VM, add a 60
> sleep to /usr/lib/python2.7/site-packages/vdsm/API.py
> at Global#getStats
> 5. Hard reset the host the that is running the VM
> 6. Make sure there was no attempt to run the VM on the restarted host and
> that it was filtered out

Comment 43 RHV bug bot 2019-12-13 13:14:16 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com

Comment 44 RHV bug bot 2019-12-20 17:44:10 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com

Comment 45 Shir Fishbain 2019-12-22 13:32:38 UTC
The WARM message [1] doesn't appear in the engine.log in the 4.4.0-0.9 version.
There are the steps to reproduce this bug (4.4 V):

1. You need a setup with two hosts
2. Create a VM with lease and start the VM 
3. The host that doesn't run the VM should have      
" 'acquired': domStatus.hasHostId is True, " change this row to 'acquired': False,
at /usr/lib/python3.6/site-packages/vdsm/storage/hsm.py 
4. Add a delay at getStats on the host that run the VM, add a 60 sleep to /usr/lib/python3.6/site-packages/vdsm/API.py
at Global#getStats
5. Make a reboot to the host that is running the VM
6. Make sure there isn't attempt to run the VM on the restarted host and on the other host
The status of the VM should be Unknown, the fixing of the bug was to see if the WARN: "VM lease is not ready yet" appears in the logs.

[1] 2019-11-17 14:30:20,874+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_2,$filterName VM leases ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
(This message is from the verification of 4.3)

The WARN messages that appear :
2019-12-22 14:46:22,441+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] EVENT_ID: VDS_ALERT_NO_PM_CONFIG_FENCE_OPERATION_SKIPPED(9,028), Host host_mixed_2 became non responsive. It has no power management configured. Please check the host status, manually reboot it, and click "Confirm Host Has Been Rebooted"

2019-12-22 14:46:22,448+02 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] Trying to release exclusive lock which does not exist, lock key: '1b6761c2-336b-4a53-bd9b-6f6b64d9d377VDS_FENCE'

Comment 46 Benny Zlotnik 2019-12-22 13:43:14 UTC
(In reply to Shir Fishbain from comment #45)
> The WARM message [1] doesn't appear in the engine.log in the 4.4.0-0.9
> version.
> There are the steps to reproduce this bug (4.4 V):
> 
> 1. You need a setup with two hosts
> 2. Create a VM with lease and start the VM 
> 3. The host that doesn't run the VM should have      
> " 'acquired': domStatus.hasHostId is True, " change this row to 'acquired':
> False,
> at /usr/lib/python3.6/site-packages/vdsm/storage/hsm.py 
> 4. Add a delay at getStats on the host that run the VM, add a 60 sleep to
> /usr/lib/python3.6/site-packages/vdsm/API.py
> at Global#getStats
> 5. Make a reboot to the host that is running the VM
> 6. Make sure there isn't attempt to run the VM on the restarted host and on
> the other host
> The status of the VM should be Unknown, the fixing of the bug was to see if
> the WARN: "VM lease is not ready yet" appears in the logs.
> 
> [1] 2019-11-17 14:30:20,874+02 WARN 
> [org.ovirt.engine.core.bll.RunVmCommand]
> (EE-ManagedThreadFactory-engine-Thread-68607) [7851617c] Validation of
> action 'RunVm' failed for user SYSTEM. Reasons:
> VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,
> VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_2,$filterName VM leases
> ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,
> SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL
> (This message is from the verification of 4.3)
> 
> The WARN messages that appear :
> 2019-12-22 14:46:22,441+02 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] EVENT_ID:
> VDS_ALERT_NO_PM_CONFIG_FENCE_OPERATION_SKIPPED(9,028), Host host_mixed_2
> became non responsive. It has no power management configured. Please check
> the host status, manually reboot it, and click "Confirm Host Has Been
> Rebooted"
> 
> 2019-12-22 14:46:22,448+02 WARN 
> [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
> (EE-ManagedThreadFactory-engine-Thread-95589) [1c18dacb] Trying to release
> exclusive lock which does not exist, lock key:
> '1b6761c2-336b-4a53-bd9b-6f6b64d9d377VDS_FENCE'

The host should eventually be up, how long is it stuck?

Comment 47 Shir Fishbain 2019-12-23 17:18:24 UTC
Verified - The WARN message appears in engine.log:

2019-12-23 18:37:09,445+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (EE-ManagedThreadFactory-engine-Thread-4724) [2445e6b8] Validation of action 'RunVm' failed for user SYSTEM. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,SCHEDULING_ALL_HOSTS_FILTERED_OUT,VAR__FILTERTYPE__INTERNAL,$hostName host_mixed_3,$filterName VM leases ready,ACTION_TYPE_FAILED_VM_LEASE_IS_NOT_READY_FOR_HOST,SCHEDULING_HOST_FILTERED_REASON_WITH_DETAIL

ovirt-engine-4.4.0-0.13.master.el7.noarch
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64

Comment 48 RHV bug bot 2020-01-08 14:48:24 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com

Comment 49 RHV bug bot 2020-01-08 15:15:01 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com

Comment 50 RHV bug bot 2020-01-24 19:50:14 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops@redhat.com

Comment 54 errata-xmlrpc 2020-08-04 13:20:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3247


Note You need to log in before you can comment on or make changes to this bug.