Bug 1595303
| Summary: | HE VM migration fails with libvirtError: resource busy: Failed to acquire lock: Lease is held by another host | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Polina <pagranat> | ||||||||
| Component: | BLL.HostedEngine | Assignee: | Doron Fediuck <dfediuck> | ||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Polina <pagranat> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 4.2.2 | CC: | ahadas, bugs, michal.skrivanek, msivak, pagranat | ||||||||
| Target Milestone: | --- | Keywords: | Automation | ||||||||
| Target Release: | --- | Flags: | pm-rhel:
ovirt-4.5?
|
||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2021-06-01 12:00:39 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Hi Paulina, can you please attach some additional information? - what kind of storage domain did you use for hosted engine (NFS3, NFS4, iSCSI..) - how did sanlock look like just before the migration (sanlock client status from both source and destination) Meital already answered our question about how was the migration started - by clicking the Migrate button in the UI. Is that correct? (In reply to Michal Skrivanek from comment #1) > https://bugzilla.redhat.com/show_bug.cgi?id=1459829#c31 ? Maybe, but I do not think so. The error here is "libvirtError: resource busy: Failed to acquire lock: Lease is held by another host" and that seems to imply the sanlock on the other host knew about the lockspace. (In reply to Martin Sivák from comment #2) Hi, Hosted Engine disk is on iSCSI for this env. About "how the migration started" - there was no clicking on UI button. The failures happened by automation build running. The tests send a rest action: 2018-06-23 15:52:16,246 - MainThread - art.ll_lib.vms - INFO - Migrate VM HostedEngine 2018-06-23 15:52:16,246 - MainThread - vms - DEBUG - Action request content is -- url:/ovirt-engine/api/vms/96b4f434-de9e-4be6-b842-adae55933dc2/migrate body:<action> <async>false</async> <force>true</force> <grace_period> <expiry>10</expiry> </grace_period> <host id="074db613-5fb8-4722-8801-130797dc18b1"/> </action> sanlock client status is not reported to logs. Polina, still reproducible? yes, In the last automation runs we saw this problem twice on 4.3.3.2. Since engine was down after this for a long time and the whole environment was not able to run the tests, we had to reprovision and rebuild everything. didn't save the logs. the next time I see it , will update with the logs. Created attachment 1557486 [details]
logs 4.2.8-7
the migration failure is reproduced in the last ovirt-engine-4.2.8.7-0.1.el7ev.noarch run.
the time for the failed migration in the attached logs is 2019-04-18 20:12:56
the engine.log also contains a lot of errors, like "createCommand failed: WFLYWELD0039: Singleton not set for null. This means that you are trying to access a weld deployment with a Thread Context ClassLoader that is not associated with the deployment." though it seems to be not related to the problem (https://bugzilla.redhat.com/show_bug.cgi?id=1701898) Created attachment 1571545 [details]
logs 4.2.8
happened again in the last 4.2 run (ovirt-engine-4.2.8.7-0.1.el7ev.noarch). logs logs_4.2.8.tar.gz attached
deprecating SLA team usage, moving to Virt Hi Arik, this failure was never seen in 4.4.6 This issue didn't reproduce lately (comment 11) - it could be that it was fixed by other changes. Anyway, it doesn't make sense to put an effort into investigating it at this point. If it happens again, feel free to reopen. |
Created attachment 1454685 [details] logs Description of problem: HE migration sometimes fails with libvirtError: resource busy: Failed to acquire lock: Lease is held by another host. Version-Release number of selected component (if applicable): rhv-release-4.2.4-6-001.noarch How reproducible: sometimes happens. not easily reproduced. Steps to Reproduce: 1. Run HE environment with three hosts and the hosted storage is on iscsi 2. Sometimes he vm migration fails with the trace (please see lynx16_vdsm.log): 2018-06-23 15:52:15,547+0300 ERROR (vm/96b4f434) [virt.vm] (vmId='96b4f434-de9e-4be6-b842-adae55933dc2') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2876, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: resource busy: Failed to acquire lock: Lease is held by another host Expected results: migration succeeds Additional info: the attached contains: agent.log broker.log engine.log logs lynx14_vdsm.log lynx16_vdsm.log lynx17_vdsm.log. The Migration attempt from lynx16 to lynx17.