Description of problem: Live storage migration operation sometimes failed due to the following error on VDSM: 2017-06-05 11:54:46,591+0300 INFO (jsonrpc/2) [vdsm.api] START sdm_copy_data(job_id='12fa9df4-9357-42de-9950-a83d3dd02533', source={'img_id': '2cc28ab4-7e16-4106-b8f2-00e49c64a29c', 'sd_id': '28df2ec8-431c-4f90-b18a-58d9d13882d7', 'endpoint_type': 'div', 'vol_id': 'c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1'}, destination={'generation': 0, 'img_id': '2cc28ab4-7e16-4106-b8f2-00e49c64a29c', 'sd_id': '45f6b9b4-6b94-44db-a740-2494f05bffa4', 'endpoint_type': 'div', 'vol_id': 'c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1'}) from=::ffff:10.35.69.13,33844, flow_id=disks_syncAction_ac2ff124-a7a0-47ba (api:46) 2017-06-05 11:54:46,592+0300 INFO (jsonrpc/2) [vdsm.api] FINISH sdm_copy_data return=None from=::ffff:10.35.69.13,33844, flow_id=disks_syncAction_ac2ff124-a7a0-47ba (api:52) 2017-06-05 11:54:46,592+0300 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call SDM.copy_data succeeded in 0.00 seconds (__init__:539) 2017-06-05 11:54:46,593+0300 INFO (tasks/9) [storage.ThreadPool.WorkerThread] START task b31cd91b-f35b-4f15-95b5-5de63808760f (cmd=<bound method Task.commit of <storage.task.Task instance at 0x7f71e4244440>>, args=None) (threadPool:208) 2017-06-05 11:54:46,593+0300 INFO (tasks/9) [root] Running job '12fa9df4-9357-42de-9950-a83d3dd02533'... (jobs:179) 2017-06-05 11:54:46,820+0300 INFO (tasks/9) [storage.LVM] Refreshing lvs: vg=45f6b9b4-6b94-44db-a740-2494f05bffa4 lvs=['leases'] (lvm:1291) 2017-06-05 11:54:46,821+0300 INFO (tasks/9) [storage.LVM] Refreshing LVs (vg=45f6b9b4-6b94-44db-a740-2494f05bffa4, lvs=['leases']) (lvm:1319) 2017-06-05 11:54:46,929+0300 INFO (tasks/9) [storage.SANLock] Acquiring Lease(name='c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1', path='/dev/45f6b9b4-6b94-44db-a740-2494f05bffa4/leases', offset=113246208) for host id 1 (clusterlock:343) 2017-06-05 11:54:46,930+0300 ERROR (tasks/9) [storage.guarded] Error acquiring lock <VolumeLease ns=04_lease_45f6b9b4-6b94-44db-a740-2494f05bffa4, name=c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1, mode=exclusive at 0x3acaf10> (guarded:96) 2017-06-05 11:54:46,931+0300 ERROR (tasks/9) [root] Job '12fa9df4-9357-42de-9950-a83d3dd02533' failed (jobs:217) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in run self._run() File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 64, in _run with guarded.context(self._source.locks + self._dest.locks): File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__ six.reraise(*exc) File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__ lock.acquire() File "/usr/share/vdsm/storage/volume.py", line 1418, in acquire dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id) File "/usr/share/vdsm/storage/sd.py", line 471, in acquireVolumeLease self._domainLock.acquire(hostId, lease) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire "Cannot acquire %s" % (lease,), str(e)) AcquireLockFailure: Cannot obtain lock: "id=45f6b9b4-6b94-44db-a740-2494f05bffa4, rc=28, out=Cannot acquire Lease(name='c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1', path='/dev/45f6b9b4-6b94-44db-a740-2494f05bffa4/leases', offset=113246208), err=(28, 'Sanlock resource not acquired', 'No space left on device')" 2017-06-05 11:54:46,932+0300 INFO (tasks/9) [root] Job '12fa9df4-9357-42de-9950-a83d3dd02533' will be deleted in 3600 seconds (jobs:245) Version-Release number of selected component (if applicable): VDSM - 4.19.17-1.el7ev.x86_64 Engine - 4.1.3.1-0.1.el7 How reproducible: 70% Steps to Reproduce: 1. Create 2 storage domains 2. Craete a VM with disk 3. Run the VM 4. Move the VM's disk to other storage domain Actual results: Move disk can failed with error massage in the engine - 2017-06-05 11:54:49,428+03 INFO [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler6) [disks_syncAction_ac2ff124-a7a0-47ba] Command CopyData id: 'c9803db7-9802-4a0d-80a3-a22fc42dd19f': execution was completed, the command status is 'FAILED' 2017-06-05 11:54:49,432+03 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_ac2ff124-a7a0-47ba] Command 'CopyImageGroupWithData' (id: 'e774b141-5221-488f-b621-853ca8a7e78d') waiting on child command id: 'd413e773-b740-4570-96ad-d0df0e5f311f' type:'CopyImageGroupVolumesData' to complete 2017-06-05 11:54:50,439+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler9) [disks_syncAction_ac2ff124-a7a0-47ba] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure. Expected results: Move disk operation should succeed Additional info: VDSM and Engine logs attached
This is a duplicate of bug #1422508 *** This bug has been marked as a duplicate of bug 1422508 ***