Bug 1684113 - Failed to start VM with LibVirtError "Failed to acquire lock: No space left on device" (code=1)
Summary: Failed to start VM with LibVirtError "Failed to acquire lock: No space left o...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.3.0
Hardware: x86_64
OS: Unspecified
medium
high
Target Milestone: ovirt-4.3.2
: 4.3.2.1
Assignee: Eyal Shenitzky
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
: 1715128 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-28 13:16 UTC by Yosi Ben Shimon
Modified: 2020-08-03 15:39 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.3.2.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-26 07:20:48 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.3+


Attachments (Terms of Use)
logs (3.26 MB, application/gzip)
2019-02-28 13:16 UTC, Yosi Ben Shimon
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4210491 0 None None None 2019-06-10 18:53:24 UTC
oVirt gerrit 98294 0 master MERGED core: save temporary domains data in a temporary var 2020-08-18 11:41:23 UTC

Description Yosi Ben Shimon 2019-02-28 13:16:57 UTC
Created attachment 1539483 [details]
logs

Description of problem:
Start VM failure with LibVirtError Failed to acquire lock: No space left on device.
Looks like the same issue as in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1599732

The bug above is already verified but it seems the same error with the same test case.

Following this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1599732#c9 I added "refresh capabilities" to the test case steps before starting the VM.

From the engine log (same for all hosts retries):


2019-02-26 05:18:55,090+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] VM '516dbb5c-a5f5-423f-b91f-15930d3c1990'(vm_0_TestCase25515_2605155367) moved from 'WaitForLaunch' --> 'Down'
2019-02-26 05:18:55,113+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN_ERROR(119), VM vm_0_TestCase25515_2605155367 is down with error. Exit message: Failed to acquire lock: No space left on device.
2019-02-26 05:18:55,113+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] add VM '516dbb5c-a5f5-423f-b91f-15930d3c1990'(vm_0_TestCase25515_2605155367) to rerun treatment
2019-02-26 05:18:55,127+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-7) [] Rerun VM '516dbb5c-a5f5-423f-b91f-15930d3c1990'. Called from VDS 'host_mixed_1'
2019-02-26 05:18:55,146+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-13222) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM vm_0_TestCase25515_2605155367 on Host host_mixed_1.

From VDSM log (same for all hosts):


2019-02-26 05:18:54,175+0200 ERROR (vm/516dbb5c) [virt.vm] (vmId='516dbb5c-a5f5-423f-b91f-15930d3c1990') The vm start process failed (vm:937)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2855, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: Failed to acquire lock: No space left on device
2019-02-26 05:18:54,175+0200 INFO  (vm/516dbb5c) [virt.vm] (vmId='516dbb5c-a5f5-423f-b91f-15930d3c1990') Changed state to Down: Failed to acquire lock: No space left on device (code=1) (vm:1675)



Version-Release number of selected component (if applicable):
ovirt-engine-4.3.1.1-0.1.el7.noarch


How reproducible:
Managed to reproduce it only after ~6 executions with the same environment and same errors.

Steps to Reproduce (according to the TestCase):

1. Create an HA VM with lease reside on default storage domain
2. Move the storage domain to maintenance
3. Try to start VM (expected result -> fail)
4. Activate the storage domain where the lease resides on
5. Start VM

Actual results:
The VM failed to start

Expected results:
The VM should start


Additional info:
Attached relevant logs

Comment 3 Yosi Ben Shimon 2019-03-20 14:24:46 UTC
Tested using:
ovirt-engine-4.3.2.1-0.1.el7.noarch

After discussing with Eyal about this, I've changed the test case so it will try to start the VM a few times and wait ~10 seconds between one try to another (total of ~60 seconds or 6 tries).

The VM starts successfully.

Moving to VERIFIED

Comment 4 Sandro Bonazzola 2019-03-26 07:20:48 UTC
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 5 Tal Nisan 2019-06-10 14:31:26 UTC
*** Bug 1715128 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.