Bug 1422508 - acquire lease Operations which depends on acquire lease, fails with error 'Sanlock resource not acquired', since the acquire has not been completed before the operation took place.
Summary: acquire lease Operations which depends on acquire lease, fails with error 'Sa...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.20.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.1.5
: 4.19.27
Assignee: Nir Soffer
QA Contact: Raz Tamir
URL:
Whiteboard:
: 1459058 (view as bug list)
Depends On: 1469175
Blocks: 1459448
TreeView+ depends on / blocked
 
Reported: 2017-02-15 12:50 UTC by Raz Tamir
Modified: 2017-08-23 08:02 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-08-23 08:02:31 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: blocker+


Attachments (Terms of Use)
logs (2.13 MB, application/x-gzip)
2017-02-15 12:50 UTC, Raz Tamir
no flags Details
new logs (750.53 KB, application/x-gzip)
2017-02-15 15:53 UTC, Raz Tamir
no flags Details
logs 21.2.17 (568.32 KB, application/x-gzip)
2017-02-21 17:02 UTC, Raz Tamir
no flags Details
engine and vdsm.log cold merge (4.03 MB, application/x-gzip)
2017-05-24 13:49 UTC, Raz Tamir
no flags Details
logs with patch (1.45 MB, application/x-gzip)
2017-06-28 13:37 UTC, Raz Tamir
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1043047 0 unspecified CLOSED [RFE] Storage domain gets activated even if one of the host can't access it 2021-02-22 00:41:40 UTC
oVirt gerrit 70441 0 master MERGED clusterlock: Improve logging when acquiring host id 2020-11-11 22:36:39 UTC
oVirt gerrit 70442 0 master MERGED monitor: Log changes in domain host id status 2020-11-11 22:36:59 UTC
oVirt gerrit 77762 0 master MERGED concurrent: Add ValidatingEvent 2020-11-11 22:36:38 UTC
oVirt gerrit 77763 0 master MERGED clusterlock: Wait until host id is acquired 2020-11-11 22:36:38 UTC
oVirt gerrit 77765 0 master MERGED clusterlock: Move logs out of the locked region 2020-11-11 22:36:38 UTC
oVirt gerrit 79163 0 ovirt-4.1 MERGED clusterlock: Improve logging when acquiring host id 2020-11-11 22:36:39 UTC
oVirt gerrit 79164 0 ovirt-4.1 MERGED concurrent: Add ValidatingEvent 2020-11-11 22:36:39 UTC
oVirt gerrit 79165 0 ovirt-4.1 MERGED clusterlock: Wait until host id is acquired 2020-11-11 22:37:00 UTC
oVirt gerrit 79166 0 ovirt-4.1 MERGED clusterlock: Move logs out of the locked region 2020-11-11 22:36:40 UTC
oVirt gerrit 79167 0 ovirt-4.1 ABANDONED tests: Add clusterlock test module 2020-11-11 22:36:40 UTC
oVirt gerrit 79168 0 ovirt-4.1 ABANDONED fakesanlock: Implement missing methods 2020-11-11 22:37:00 UTC
oVirt gerrit 80425 0 ovirt-4.1 MERGED monitor: Log changes in domain host id status 2020-11-11 22:36:40 UTC

Internal Links: 1043047

Description Raz Tamir 2017-02-15 12:50:49 UTC
Created attachment 1250602 [details]
logs

Description of problem:
I created a VM with 2 disks, one on iscsi storage domain ad the second on NFS.
After deactivating and detaching the iscsi storage doamin, the VM becomes unregistered under the iscsi storage domain.
I imported the storage domain back to the environment and register the template:

POST to /ovirt-engine/api/storagedomains/19906b00-7f68-4af8-9672-cddbcb94f054/templates/1cec261a-c04d-4db1-92fa-ca87d72cfb93/register

body:
<action>
    <allow_partial_import>true</allow_partial_import>
    <async>false</async>
    <cluster>
        <name>golden_env_mixed_1</name>
    </cluster>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <reassign_bad_macs>true</reassign_bad_macs>
</action>

When tryingto clone a VM from that template, the operation failed (sometimes) with the error:

- engine.log:
2017-02-15 13:10:30,237+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler10) [4247d907] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure.
2017-02-15 13:10:30,260+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [4247d907] Command 'CopyImageGroupWithData' id: '31494d54-be60-45c5-9622-194d610f41e4' child commands '[ad2bf9ec-31da-4bdd-8a65-709e8e2a3c67, 07cefbd1-3f3e-4d8f-9fcb-5d6f3b3f9104]' executions were completed, status 'FAILED'
2017-02-15 13:10:31,281+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (DefaultQuartzScheduler9) [4247d907] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand' with failure.
2017-02-15 13:10:32,300+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [4b3ee393] Command 'CreateCloneOfTemplate' id: '918a847c-ef37-4c38-9a42-d645c80d98d5' child commands '[31494d54-be60-45c5-9622-194d610f41e4]' executions were completed, status 'FAILED'
2017-02-15 13:10:32,300+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [4b3ee393] Command 'CreateCloneOfTemplate' id: '918a847c-ef37-4c38-9a42-d645c80d98d5' Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2017-02-15 13:10:33,017+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler1) [4247d907] Setting new tasks map. The map contains now 3 tasks
2017-02-15 13:10:34,345+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [55e0b0b4] Command 'AddVmFromTemplate' id: '697a8c00-bbf9-4710-a481-641c1a259508' child commands '[918a847c-ef37-4c38-9a42-d645c80d98d5]' executions were completed, status 'FAILED'
2017-02-15 13:10:35,389+02 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler10) [55e0b0b4] Ending command 'org.ovirt.engine.core.bll.AddVmFromTemplateCommand' with failure.
2017-02-15 13:10:35,402+02 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (DefaultQuartzScheduler10) [4b3ee393] Ending command 'org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand' with failure.
2017-02-15 13:10:35,584+02 INFO  [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler10) [] Lock freed to object 'EngineLock:{exclusiveLocks='[vm_TestCase5200_REST_NFS_1512590643=<VM_NAME, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[1cec261a-c04d-4db1-92fa-ca87d72cfb93=<TEMPLATE, ACTION_TYPE_FAILED_TEMPLATE_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5200_REST_NFS_1512590643>, 63fb7230-4cb4-41eb-875d-6204803114d0=<DISK, ACTION_TYPE_FAILED_DISK_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5200_REST_NFS_1512590643>]'}'
2017-02-15 13:10:35,601+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [] EVENT_ID: USER_ADD_VM_FINISHED_FAILURE(60), Correlation ID: vms_create_4f68ccc3-f0d9-448d, Job ID: 1976d1f0-485c-4d24-8616-abf28deb1afc, Call Stack: null, Custom Event ID: -1, Message: Failed to complete VM vm_TestCase5200_REST_NFS_1512590643 creation.

- vdsm.log (spm):
2017-02-15 13:10:27,009 INFO  (tasks/6) [storage.SANLock] Acquiring Lease(name='e5509bb3-19a4-4359-bc95-628acbc696cf', path=u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge5__nfs__3/19906b00-7f
68-4af8-9672-cddbcb94f054/images/ef063408-b7a7-4e5d-8367-f184cecec602/e5509bb3-19a4-4359-bc95-628acbc696cf.lease', offset=0) for host id 3 (clusterlock:343)
2017-02-15 13:10:27,010 ERROR (tasks/6) [storage.guarded] Error acquiring lock <VolumeLease ns=04_lease_19906b00-7f68-4af8-9672-cddbcb94f054, name=e5509bb3-19a4-4359-bc95-628acbc696cf, mode=exclusive at 0x27bfd50> 
(guarded:96)
2017-02-15 13:10:27,010 ERROR (tasks/6) [root] Job 'dc77d74b-210c-4b42-8a73-d9c18c5029d7' failed (jobs:217)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in run
    self._run()
  File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 64, in _run
    with guarded.context(self._source.locks + self._dest.locks):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__
    six.reraise(*exc)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__
    lock.acquire()
  File "/usr/share/vdsm/storage/volume.py", line 1392, in acquire
    dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id)
  File "/usr/share/vdsm/storage/sd.py", line 471, in acquireVolumeLease
    self._domainLock.acquire(hostId, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire
    "Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock: u"id=19906b00-7f68-4af8-9672-cddbcb94f054, rc=28, out=Cannot acquire Lease(name='e5509bb3-19a4-4359-bc95-628acbc696cf', path=u'/rhev/data-center/mnt/10.35.110.11:_Storage__NF
S_storage__local__ge5__nfs__3/19906b00-7f68-4af8-9672-cddbcb94f054/images/ef063408-b7a7-4e5d-8367-f184cecec602/e5509bb3-19a4-4359-bc95-628acbc696cf.lease', offset=0), err=(28, 'Sanlock resource not acquired', 'No s
pace left on device')"

- vdsm.log (hsm):
2017-02-15 13:10:11,515 INFO  (jsonrpc/4) [dispatcher] Run and protect: getVolumeInfo(sdUUID='ad78e663-d94d-4542-9402-4c7554263f61', spUUID='6708e4f5-b4c9-40de-84d1-43b9e8ef31a5', imgUUID='403bfc8b-7094-494f-ac86-f
b19e444373a', volUUID='7f586784-ea88-4c1e-a599-67f0b80b1ab2', options=None) (logUtils:49)
2017-02-15 13:10:11,521 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='1aa7b758-b6ac-41f3-9666-7cedd05de3af') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3054, in getVolumeInfo
    volUUID=volUUID).getInfo()
  File "/usr/share/vdsm/storage/sd.py", line 748, in produceVolume
    volUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 361, in __init__
    manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/fileVolume.py", line 61, in __init__
    volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 84, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/volume.py", line 106, in validate
    self.validateImagePath()
  File "/usr/share/vdsm/storage/fileVolume.py", line 80, in validateImagePath
    raise se.ImagePathError(imageDir)
ImagePathError: Image path does not exist or cannot be accessed/created: (u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge5__nfs__2/ad78e663-d94d-4542-9402-4c7554263f61/images/403bfc8b-7094-494
f-ac86-fb19e444373a',)
2017-02-15 13:10:11,522 INFO  (jsonrpc/4) [storage.TaskManager.Task] (Task='1aa7b758-b6ac-41f3-9666-7cedd05de3af') aborting: Task is aborted: 'Image path does not exist or cannot be accessed/created' - code 254 (ta
sk:1175)
2017-02-15 13:10:11,523 ERROR (jsonrpc/4) [storage.Dispatcher] {'status': {'message': "Image path does not exist or cannot be accessed/created: (u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge
5__nfs__2/ad78e663-d94d-4542-9402-4c7554263f61/images/403bfc8b-7094-494f-ac86-fb19e444373a',)", 'code': 254}} (dispatcher:77)





Version-Release number of selected component (if applicable):
rhevm-4.1.1-0.1.el7

How reproducible:
50%


Steps to Reproduce:
1. Create a VM with 2 disks, on on NFS SD and the second on iscsi SD
2. Create a template from that VM
3. Deactivate the iscsi SD and detach it from the DC
4. Remove it without format it
5. Add the iscsi storage domain back to the environment and attach it to the DC
6. Register the unregistered template on that SD
7. Try to clone a VM from that template
 
Actual results:
Sometimes, the operation will fail with the errors above

Expected results:
Operation should succeed

Additional info:

Comment 1 Maor 2017-02-15 13:42:48 UTC
It looks like the exception is that there is no space left on your device:

ddbcb94f054/images/ef063408-b7a7-4e5d-8367-f184cecec602/e5509bb3-19a4-4359-bc95-628acbc696cf.lease', offset=0), err=(28, 'Sanlock resource not acquired', 'No s
pace left on device')"

Comment 2 Maor 2017-02-15 14:46:54 UTC
Just to be sure I've tried to reproduce this on my env and it seems to work, so it just looks like no space issue.
Can you please try to reproduce this once again with storage domain that have sufficient space.

Comment 3 Raz Tamir 2017-02-15 14:53:53 UTC
Maor,
I mentioned that it doesn't reproduced 100%.
I have iscsi storage domain with 150GB free space and nfs storage domain with almost 2TB of free space

Comment 4 Raz Tamir 2017-02-15 15:51:51 UTC
Update:
Seems like it happens also when the template contain only one disk, so this has nothing to do with the disks on 2 different storage domains

Attaching new logs with a scenario of:
Steps to Reproduce:
1. Create a VM with 1 disk
2. Create a template from that VM
3. Deactivate the SD (where the disk resides) and detach it from the DC
4. Remove it without format it
5. Add the storage domain back to the environment and attach it to the DC
6. Register the unregistered template on that SD
7. Try to clone a VM from that template

Comment 5 Raz Tamir 2017-02-15 15:53:52 UTC
Created attachment 1250640 [details]
new logs

engine.log:
2017-02-15 17:41:36,800+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler10) [1e01c187] FINISH, GetHostJobsVDSCommand, return: {65b5a76b-c57c-4ae6-bb10-99a31cebd468=
HostJobInfo:{id='65b5a76b-c57c-4ae6-bb10-99a31cebd468', type='storage', description='copy_data', status='failed', progress='null', error='VDSError:{code='AcquireLockFailure', message='Cannot obtain lock: u"id=d83fa
ca9-974c-46a5-992c-8297eb63d3c4, rc=28, out=Cannot acquire Lease(name='48dfc33e-e17c-4bde-9389-4171eb6c0e5f', path=u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge5__nfs__3/d83faca9-974c-46a5-9
92c-8297eb63d3c4/images/f4446ba1-72f9-400c-8d23-a0a48fdcb363/48dfc33e-e17c-4bde-9389-4171eb6c0e5f.lease', offset=0), err=(28, 'Sanlock resource not acquired', 'No space left on device')"'}'}}, log id: 648001e2
2017-02-15 17:41:36,807+02 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler10) [1e01c187] Command CopyData id: 'a614ae1b-baef-4c0c-b5a9-2ff31e0d9bc7': job '65b5a76b-c57c-4ae6-bb10-99a31c
ebd468' execution was completed with VDSM job status 'failed'
2017-02-15 17:41:36,812+02 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler10) [1e01c187] Command CopyData id: 'a614ae1b-baef-4c0c-b5a9-2ff31e0d9bc7': execution was completed, the comman
d status is 'FAILED'
2017-02-15 17:41:37,822+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler8) [1e01c187] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure.
2017-02-15 17:41:38,848+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [1e01c187] Command 'CopyImageGroupWithData' id: '23668f9e-3f1c-42ec-8472-d4ef00e73f0b' child commands '[edd461a0-39e4-4838-b82d-785984cf25fe, a614ae1b-baef-4c0c-b5a9-2ff31e0d9bc7]' executions were completed, status 'FAILED'
2017-02-15 17:41:39,872+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (DefaultQuartzScheduler3) [1e01c187] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand' with failure.
2017-02-15 17:41:40,895+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [5abff62e] Command 'CreateCloneOfTemplate' id: '2bca104c-ddbf-41d2-9690-bcf4960b28af' child commands '[23668f9e-3f1c-42ec-8472-d4ef00e73f0b]' executions were completed, status 'FAILED'
2017-02-15 17:41:40,895+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [5abff62e] Command 'CreateCloneOfTemplate' id: '2bca104c-ddbf-41d2-9690-bcf4960b28af' Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2017-02-15 17:41:40,924+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [5ce0bede] Command 'ProcessOvfUpdateForStorageDomain' id: 'bc3b50ae-98d1-4c90-8201-b4d58a22a9f4' child commands '[b058ed5f-959a-4e30-b689-6fe0b12979ac, 3341751d-1b22-4670-a5f3-3e44aa7ce612]' executions were completed, status 'SUCCEEDED'
2017-02-15 17:41:41,955+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (DefaultQuartzScheduler3) [5ce0bede] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand' successfully.
2017-02-15 17:41:41,998+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler3) [32825ff9] Command 'AddVmFromTemplate' id: 'e042d606-5f18-40f2-bbe1-54cd64646599' child commands '[2bca104c-ddbf-41d2-9690-bcf4960b28af]' executions were completed, status 'FAILED'
2017-02-15 17:41:43,044+02 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler6) [32825ff9] Ending command 'org.ovirt.engine.core.bll.AddVmFromTemplateCommand' with failure.
2017-02-15 17:41:43,060+02 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (DefaultQuartzScheduler6) [5abff62e] Ending command 'org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand' with failure.
2017-02-15 17:41:43,259+02 INFO  [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler6) [] Lock freed to object 'EngineLock:{exclusiveLocks='[vm_TestCase5297_REST_NFS_1517325312=<VM_NAME, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[a48e19e3-4138-41d8-a4bf-b6734ed4c384=<DISK, ACTION_TYPE_FAILED_DISK_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5297_REST_NFS_1517325312>, 08445fa0-3631-47e6-a8bb-c34674d98f2a=<TEMPLATE, ACTION_TYPE_FAILED_TEMPLATE_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5297_REST_NFS_1517325312>]'}'
2017-02-15 17:41:43,278+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [] EVENT_ID: USER_ADD_VM_FINISHED_FAILURE(60), Correlation ID: vms_create_bd6648b9-7c67-44ff, Job ID: 22099f7c-4d9d-4f80-925b-88630e8a54fe, Call Stack: null, Custom Event ID: -1, Message: Failed to complete VM vm_TestCase5297_REST_NFS_1517325312 creation.



vdsm.log:
2017-02-15 17:41:34,540 INFO  (tasks/3) [storage.SANLock] Acquiring Lease(name='48dfc33e-e17c-4bde-9389-4171eb6c0e5f', path=u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge5__nfs__3/d83faca9-97
4c-46a5-992c-8297eb63d3c4/images/f4446ba1-72f9-400c-8d23-a0a48fdcb363/48dfc33e-e17c-4bde-9389-4171eb6c0e5f.lease', offset=0) for host id 3 (clusterlock:343)
2017-02-15 17:41:34,540 ERROR (tasks/3) [storage.guarded] Error acquiring lock <VolumeLease ns=04_lease_d83faca9-974c-46a5-992c-8297eb63d3c4, name=48dfc33e-e17c-4bde-9389-4171eb6c0e5f, mode=exclusive at 0x22a1a50> 
(guarded:96)
2017-02-15 17:41:34,541 ERROR (tasks/3) [root] Job '65b5a76b-c57c-4ae6-bb10-99a31cebd468' failed (jobs:217)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in run
    self._run()
  File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 64, in _run
    with guarded.context(self._source.locks + self._dest.locks):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__
    six.reraise(*exc)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__
    lock.acquire()
  File "/usr/share/vdsm/storage/volume.py", line 1392, in acquire
    dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id)
  File "/usr/share/vdsm/storage/sd.py", line 471, in acquireVolumeLease
    self._domainLock.acquire(hostId, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire
    "Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock: u"id=d83faca9-974c-46a5-992c-8297eb63d3c4, rc=28, out=Cannot acquire Lease(name='48dfc33e-e17c-4bde-9389-4171eb6c0e5f', path=u'/rhev/data-center/mnt/10.35.110.11:_Storage__NFS_storage__local__ge5__nfs__3/d83faca9-974c-46a5-992c-8297eb63d3c4/images/f4446ba1-72f9-400c-8d23-a0a48fdcb363/48dfc33e-e17c-4bde-9389-4171eb6c0e5f.lease', offset=0), err=(28, 'Sanlock resource not acquired', 'No space left on device')"

Comment 6 Red Hat Bugzilla Rules Engine 2017-02-15 15:57:28 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 7 Maor 2017-02-19 14:08:53 UTC
Raz, can you please attach the sanlock logs.
Should probable be located in /var/log/sanlock.log

Comment 8 Raz Tamir 2017-02-21 17:02:13 UTC
Maor,
Attached the sanlock.log from all the hosts (3) in the environment
And the engine.log from the reproduction:

2017-02-21 18:48:02,472+02 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler10) [770e0669] Command CopyData id: 'fe5da503-c05e-49e5-b22d-61817a0f5a84': job 'bedd3167-d83a-414c-94d4-42d
7c9f44993' execution was completed with VDSM job status 'failed'
2017-02-21 18:48:02,477+02 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler10) [770e0669] Command CopyData id: 'fe5da503-c05e-49e5-b22d-61817a0f5a84': execution was completed, the com
mand status is 'FAILED'
2017-02-21 18:48:03,491+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler4) [770e0669] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' 
with failure.
2017-02-21 18:48:03,518+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler4) [770e0669] Command 'CopyImageGroupWithData' id: '865d2eeb-9b4a-4ea4-a921-fe1a25b85997' 
child commands '[13c0cb28-a107-402e-a3af-dd0eaf9d9cdd, fe5da503-c05e-49e5-b22d-61817a0f5a84]' executions were completed, status 'FAILED'
2017-02-21 18:48:04,545+02 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (DefaultQuartzScheduler1) [770e0669] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.Cop
yImageGroupWithDataCommand' with failure.
2017-02-21 18:48:05,568+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler7) [5cfd5ee8] Command 'CreateCloneOfTemplate' id: '50af7a07-c957-4e8f-acc0-d420b8cfb62
3' child commands '[865d2eeb-9b4a-4ea4-a921-fe1a25b85997]' executions were completed, status 'FAILED'
2017-02-21 18:48:05,568+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler7) [5cfd5ee8] Command 'CreateCloneOfTemplate' id: '50af7a07-c957-4e8f-acc0-d420b8cfb62
3' Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2017-02-21 18:48:07,568+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Stage: Misc configuration
2017-02-21 18:48:07,573+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Stage: Transaction commit
2017-02-21 18:48:07,593+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Stage: Closing up
2017-02-21 18:48:07,598+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Stage: Pre-termination
2017-02-21 18:48:07,617+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-mgmt-20170221184807-sto
rage-ge5-vdsm1.qa.lab.tlv.redhat.com-42cfb430.log'
2017-02-21 18:48:07,681+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler7) [36a465db] Command 'AddVmFromTemplate' id: '02dc548f-c341-4ac6-9069-fb435486fc7f' c
hild commands '[50af7a07-c957-4e8f-acc0-d420b8cfb623]' executions were completed, status 'FAILED'
2017-02-21 18:48:07,721+02 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [42cfb430] Stage: Termination
2017-02-21 18:48:07,781+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-7-thread-2) [42cfb430] EVENT_ID: HOST_AVAILABLE_UPDATES_FINISHED(885), Correlation ID: null, Call Sta
ck: null, Custom Event ID: -1, Message: Check for available updates on host host_mixed_1 was completed successfully with message 'no updates found.'.
2017-02-21 18:48:08,722+02 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler9) [36a465db] Ending command 'org.ovirt.engine.core.bll.AddVmFromTemplateCommand' with failure.
2017-02-21 18:48:08,738+02 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (DefaultQuartzScheduler9) [5cfd5ee8] Ending command 'org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand' with failure
.
2017-02-21 18:48:08,943+02 INFO  [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (DefaultQuartzScheduler9) [] Lock freed to object 'EngineLock:{exclusiveLocks='[vm_TestCase5200_REST_ISCSI_2118474182=<VM_NAM
E, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='[f82c1509-e3ae-4ad0-a2c3-720d00080a06=<DISK, ACTION_TYPE_FAILED_DISK_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5200_REST_ISCSI_2118474182>, 08c187dc-1945-4692-8
581-e45aa50d75af=<TEMPLATE, ACTION_TYPE_FAILED_TEMPLATE_IS_USED_FOR_CREATE_VM$VmName vm_TestCase5200_REST_ISCSI_2118474182>]'}'
2017-02-21 18:48:08,968+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler9) [] EVENT_ID: USER_ADD_VM_FINISHED_FAILURE(60), Correlation ID: vms_create_ffbb612
5-6de3-4a8a, Job ID: e27c2727-5b43-4435-8841-5364cbc46956, Call Stack: null, Custom Event ID: -1, Message: Failed to complete VM vm_TestCase5200_REST_ISCSI_2118474182 creation.

Comment 9 Raz Tamir 2017-02-21 17:02:50 UTC
Created attachment 1256201 [details]
logs 21.2.17

Comment 10 Maor 2017-02-23 14:51:04 UTC
Thanks for the logs Raz,

After investigating it here are the following conclusions for now:

1. The sanlock exception 'No space left on device' is misleading, we can get it also if the lease file does not exists.

2. Since there was a behavior change and now the host acquires a lease
for the copy operation, there is a dependency on the sanlock acquire
operation.
The host must first acquire a lease before the engine can call the
copy of the volume.
The lease can be acquired only after repoStats returns acquired='True'

Once we activate the storage domain the host does not guarantee the
sanlock was acquired yet, so we can not use this storage domain from
this host to make any storage operations that requires leases until
the 'acquired' flag is true for this storage domain.

Comment 11 Maor 2017-02-28 14:55:24 UTC
(In reply to Maor from comment #10)
> Thanks for the logs Raz,
> 
> After investigating it here are the following conclusions for now:
> 
> 1. The sanlock exception 'No space left on device' is misleading, we can get
> it also if the lease file does not exists.
> 
> 2. Since there was a behavior change and now the host acquires a lease
> for the copy operation, there is a dependency on the sanlock acquire
> operation.
> The host must first acquire a lease before the engine can call the
> copy of the volume.
> The lease can be acquired only after repoStats returns acquired='True'
> 
> Once we activate the storage domain the host does not guarantee the
> sanlock was acquired yet, so we can not use this storage domain from
> this host to make any storage operations that requires leases until
> the 'acquired' flag is true for this storage domain.

This bug is mainly caused since the acquire has not been completed before the creation of the disk took place.
Eventually once the acquire succeeded the operation should succeed.
The appropriate fix is to maintain a memory table of host per domain for acquire flag (true or false) in the monitoring phase.
Therefore I think it should be targeted to 4.2

Comment 12 Red Hat Bugzilla Rules Engine 2017-02-28 14:56:38 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 13 Avihai 2017-04-09 13:31:59 UTC
The Same issue occurred multiple times in latest tier2 automation run (ovirt-engine-4.1.1.8-0.1.el7.noarch) marking it as automation blocker.

Comment 14 Allon Mureinik 2017-04-09 14:21:33 UTC
(In reply to Maor from comment #10)
> Thanks for the logs Raz,
> 
> After investigating it here are the following conclusions for now:
> 
> 1. The sanlock exception 'No space left on device' is misleading, we can get
> it also if the lease file does not exists.
We need a bug on sanlock for this issue.

> 2. Since there was a behavior change and now the host acquires a lease
> for the copy operation, there is a dependency on the sanlock acquire
> operation.
> The host must first acquire a lease before the engine can call the
> copy of the volume.
> The lease can be acquired only after repoStats returns acquired='True'
> 
> Once we activate the storage domain the host does not guarantee the
> sanlock was acquired yet, so we can not use this storage domain from
> this host to make any storage operations that requires leases until
> the 'acquired' flag is true for this storage domain.

Would a simple sleep in the automation's code circumvent this?

Comment 15 Maor 2017-04-18 08:39:44 UTC
(In reply to Allon Mureinik from comment #14)
> (In reply to Maor from comment #10)
> > Thanks for the logs Raz,
> > 
> > After investigating it here are the following conclusions for now:
> > 
> > 1. The sanlock exception 'No space left on device' is misleading, we can get
> > it also if the lease file does not exists.
> We need a bug on sanlock for this issue.


Added a new bug:
 https://bugzilla.redhat.com/1442971 - misleading "No space left on device" message return from sanlock.

> 
> > 2. Since there was a behavior change and now the host acquires a lease
> > for the copy operation, there is a dependency on the sanlock acquire
> > operation.
> > The host must first acquire a lease before the engine can call the
> > copy of the volume.
> > The lease can be acquired only after repoStats returns acquired='True'
> > 
> > Once we activate the storage domain the host does not guarantee the
> > sanlock was acquired yet, so we can not use this storage domain from
> > this host to make any storage operations that requires leases until
> > the 'acquired' flag is true for this storage domain.
> 
> Would a simple sleep in the automation's code circumvent this?

It can be a suitable temporary fix for now.
David, I know that you mentioned that 26 seconds might be too short for lockspace to be ready to be used.
How much time would you recommend to sleep before acquire the lease? (until the we will add a proper fix).

Comment 16 David Teigland 2017-04-18 14:58:14 UTC
The minimum time is about 21 seconds, so with no external delays, 26 should be good.  But in cases we saw before, I think that there were i/o delays, and it's hard to predict in that case.  In theory, given max i/o delays (10 sec), a clean add_lockspace with a free host_id could take up to 50 seconds (3 i/o ops at 10 sec each, plus the 20 second algorithm wait.)

Comment 17 Eyal Shenitzky 2017-05-01 08:27:35 UTC
This error also appears on cold move operations:

reproduced by ~50%.

Steps to reproduce:

1. Create a VM with disk
2. Move the VM disk to different storage domain


2017-04-27 18:20:59,023+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (DefaultQuartzScheduler8) [] Lock freed to object 'EngineLock:{exclusiveLocks='[56b0acfa-36f9-4e00-883b-73fdd6b0c7dc=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName 5988_disk_virtiocow_2718195908>]', sharedLocks='[7a8ad949-b33c-4a22-993a-3562f6a4dd46=<VM, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName 5988_disk_virtiocow_2718195908>]'}'
2017-04-27 18:20:59,028+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler8) [] EVENT_ID: USER_MOVED_DISK_FINISHED_SUCCESS(2,010), Correlation ID: disks_syncAction_4f9ddf09-f0c7-48d8, Job ID: e56592e6-8f76-4db8-8383-117de9be2ed0, Call Stack: null, Custom Event ID: -1, Message: User admin@internal-authz finished moving disk 5988_disk_virtiocow_2718195908 to domain iscsi_2.
2017-04-27 18:21:00,033+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'a1969c0b-8c22-42d2-a21d-9e9d4a31c6d6:export_domain' report isn't an actual report
2017-04-27 18:21:00,033+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '7e06e771-287b-4119-82f3-ef7ba3bd515e:test_gluster_2' report isn't an actual report
2017-04-27 18:21:00,033+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '82d074d4-05b9-4be1-8e3f-8a01284d945a:test_gluster_1' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'a8463627-25ca-4488-bd1d-e6a2eac6d83b:test_gluster_0' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'b3925f20-df6c-4884-9b81-f6902de491c6:iscsi_1' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'd73f3d30-7c4d-4396-a74d-42e6f35b0925:iscsi_0' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '6e34c8c9-5060-479a-a12a-f768225a489f:nfs_2' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'fdfc4830-f8e8-481d-9f6f-9a7596bd7e01:nfs_1' report isn't an actual report
2017-04-27 18:21:00,034+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '18378f41-8261-4360-a017-3f28d8364ed4:nfs_0' report isn't an actual report
2017-04-27 18:21:00,035+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '5770c3a0-e767-40de-b495-d96d7a27e831:iscsi_2' report isn't an actual report
2017-04-27 18:21:05,243+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-4) [disks_syncAction_23987d98-0e8d-4d73] Running command: MoveDisksCommand internal: false. Entities affected :  ID: aec5ab78-da82-4268-ac49-59d2d4e4eef4 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2017-04-27 18:21:05,259+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-4) [disks_syncAction_23987d98-0e8d-4d73] Lock Acquired to object 'EngineLock:{exclusiveLocks='[aec5ab78-da82-4268-ac49-59d2d4e4eef4=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName 5988_disk_virtio_scsicow_2718195980>]', sharedLocks='[7a8ad949-b33c-4a22-993a-3562f6a4dd46=<VM, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName 5988_disk_virtio_scsicow_2718195980>]'}'
2017-04-27 18:21:05,318+03 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (org.ovirt.thread.pool-6-thread-28) [disks_syncAction_23987d98-0e8d-4d73] Running command: MoveOrCopyDiskCommand internal: false. Entities affected :  ID: aec5ab78-da82-4268-ac49-59d2d4e4eef4 Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER,  ID: b3925f20-df6c-4884-9b81-f6902de491c6 Type: StorageAction group CREATE_DISK with role type USER
2017-04-27 18:21:05,335+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.MoveImageGroupCommand] (org.ovirt.thread.pool-6-thread-28) [disks_syncAction_23987d98-0e8d-4d73] Running command: MoveImageGroupCommand internal: true. Entities affected :  ID: b3925f20-df6c-4884-9b81-f6902de491c6 Type: Storage
2017-04-27 18:21:05,351+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (org.ovirt.thread.pool-6-thread-28) [disks_syncAction_23987d98-0e8d-4d73] Running command: CopyImageGroupWithDataCommand internal: true.
2017-04-27 18:21:05,379+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (org.ovirt.thread.pool-6-thread-28) [disks_syncAction_23987d98-0e8d-4d73] Running command: CloneImageGroupVolumesStructureCommand internal: true.
2017-04-27 18:21:05,409+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-28) [disks_syncAction_23987d98-0e8d-4d73] EVENT_ID: USER_MOVED_DISK(2,008), Correlation ID: disks_syncAction_23987d98-0e8d-4d73, Job ID: 55936346-0439-4aec-a0a9-99703f7026da, Call Stack: null, Custom Event ID: -1, Message: User admin@internal-authz moving disk 5988_disk_virtio_scsicow_2718195980 to domain iscsi_1.
2017-04-27 18:21:06,044+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Command 'CopyImageGroupWithData' (id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7') waiting on child command id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef' type:'CloneImageGroupVolumesStructure' to complete
2017-04-27 18:21:06,046+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Starting child command 1 of 1, image 'f60ebc6e-059f-496d-b9bc-4edbee283706'
2017-04-27 18:21:06,086+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] START, GetVolumeInfoVDSCommand(HostName = host_mixed_1, GetVolumeInfoVDSCommandParameters:{runAsync='true', hostId='2d690294-ccde-4b0d-9faf-94c174a555dc', storagePoolId='d5c26715-43c0-4b8b-9edb-e7f1d689153c', storageDomainId='6e34c8c9-5060-479a-a12a-f768225a489f', imageGroupId='aec5ab78-da82-4268-ac49-59d2d4e4eef4', imageId='f60ebc6e-059f-496d-b9bc-4edbee283706'}), log id: 5c476676
2017-04-27 18:21:07,138+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@78a3d6b0, log id: 5c476676
2017-04-27 18:21:07,155+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Running command: CreateVolumeContainerCommand internal: true.
2017-04-27 18:21:07,169+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] START, CreateSnapshotVDSCommand( CreateSnapshotVDSCommandParameters:{runAsync='true', storagePoolId='d5c26715-43c0-4b8b-9edb-e7f1d689153c', ignoreFailoverLimit='false', storageDomainId='b3925f20-df6c-4884-9b81-f6902de491c6', imageGroupId='aec5ab78-da82-4268-ac49-59d2d4e4eef4', imageSizeInBytes='1073741824', volumeFormat='COW', newImageId='f60ebc6e-059f-496d-b9bc-4edbee283706', imageType='Sparse', newImageDescription='null', imageInitialSizeInBytes='186182', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000'}), log id: 485216f1
2017-04-27 18:21:07,170+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] -- executeIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID
2017-04-27 18:21:08,219+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] FINISH, CreateSnapshotVDSCommand, return: f60ebc6e-059f-496d-b9bc-4edbee283706, log id: 485216f1
2017-04-27 18:21:08,224+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command '92bcd488-6a03-45c1-8a4f-e932f40e94bc'
2017-04-27 18:21:08,224+03 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] CommandMultiAsyncTasks::attachTask: Attaching task 'e889bbb0-645e-4445-8e49-06331eeb3865' to command '92bcd488-6a03-45c1-8a4f-e932f40e94bc'.
2017-04-27 18:21:08,236+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Adding task 'e889bbb0-645e-4445-8e49-06331eeb3865' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet..
2017-04-27 18:21:08,244+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] BaseAsyncTask::startPollingTask: Starting to poll task 'e889bbb0-645e-4445-8e49-06331eeb3865'.
2017-04-27 18:21:08,254+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveImageGroup' (id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63') waiting on child command id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7' type:'CopyImageGroupWithData' to complete
2017-04-27 18:21:08,259+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveOrCopyDisk' (id: '2d553d2f-96b8-483d-a443-8f92f7afb4ff') waiting on child command id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63' type:'MoveImageGroup' to complete
2017-04-27 18:21:10,263+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [disks_syncAction_23987d98-0e8d-4d73] Command 'CopyImageGroupWithData' (id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7') waiting on child command id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef' type:'CloneImageGroupVolumesStructure' to complete
2017-04-27 18:21:10,267+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [disks_syncAction_23987d98-0e8d-4d73] Command 'CloneImageGroupVolumesStructure' (id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef') waiting on child command id: '92bcd488-6a03-45c1-8a4f-e932f40e94bc' type:'CreateVolumeContainer' to complete
2017-04-27 18:21:10,277+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveImageGroup' (id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63') waiting on child command id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7' type:'CopyImageGroupWithData' to complete
2017-04-27 18:21:10,282+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler5) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveOrCopyDisk' (id: '2d553d2f-96b8-483d-a443-8f92f7afb4ff') waiting on child command id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63' type:'MoveImageGroup' to complete
2017-04-27 18:21:10,703+03 INFO  [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (default task-23) [snapshots_create_54db28a9-18a1-4e00] Failed to Acquire Lock to object 'EngineLock:{exclusiveLocks='[7a8ad949-b33c-4a22-993a-3562f6a4dd46=<VM, ACTION_TYPE_FAILED_SNAPSHOT_IS_BEING_TAKEN_FOR_VM$VmName vm_TestCase19652_while_sna_2718193411>]', sharedLocks='null'}'
2017-04-27 18:21:10,703+03 WARN  [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (default task-23) [snapshots_create_54db28a9-18a1-4e00] Validation of action 'CreateAllSnapshotsFromVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED,$DiskName 5988_disk_virtio_scsicow_2718195980
2017-04-27 18:21:10,709+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-23) [] Operation Failed: [Cannot create Snapshot. Disk 5988_disk_virtio_scsicow_2718195980 is being moved or copied.]
2017-04-27 18:21:12,358+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler9) [] Setting new tasks map. The map contains now 10 tasks
2017-04-27 18:21:14,287+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command 'CopyImageGroupWithData' (id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7') waiting on child command id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef' type:'CloneImageGroupVolumesStructure' to complete
2017-04-27 18:21:14,289+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command 'CloneImageGroupVolumesStructure' (id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef') waiting on child command id: '92bcd488-6a03-45c1-8a4f-e932f40e94bc' type:'CreateVolumeContainer' to complete
2017-04-27 18:21:14,295+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveImageGroup' (id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63') waiting on child command id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7' type:'CopyImageGroupWithData' to complete
2017-04-27 18:21:14,300+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveOrCopyDisk' (id: '2d553d2f-96b8-483d-a443-8f92f7afb4ff') waiting on child command id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63' type:'MoveImageGroup' to complete
2017-04-27 18:21:15,130+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'a1969c0b-8c22-42d2-a21d-9e9d4a31c6d6:export_domain' report isn't an actual report
2017-04-27 18:21:15,130+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '7e06e771-287b-4119-82f3-ef7ba3bd515e:test_gluster_2' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '82d074d4-05b9-4be1-8e3f-8a01284d945a:test_gluster_1' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'a8463627-25ca-4488-bd1d-e6a2eac6d83b:test_gluster_0' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'b3925f20-df6c-4884-9b81-f6902de491c6:iscsi_1' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'd73f3d30-7c4d-4396-a74d-42e6f35b0925:iscsi_0' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '6e34c8c9-5060-479a-a12a-f768225a489f:nfs_2' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain 'fdfc4830-f8e8-481d-9f6f-9a7596bd7e01:nfs_1' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '18378f41-8261-4360-a017-3f28d8364ed4:nfs_0' report isn't an actual report
2017-04-27 18:21:15,131+03 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (DefaultQuartzScheduler7) [] Domain '5770c3a0-e767-40de-b495-d96d7a27e831:iscsi_2' report isn't an actual report
2017-04-27 18:21:18,217+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler4) [] Polling and updating Async Tasks: 10 tasks, 1 tasks to poll now
2017-04-27 18:21:19,227+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [] SPMAsyncTask::PollTask: Polling task 'e889bbb0-645e-4445-8e49-06331eeb3865' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'.
2017-04-27 18:21:19,231+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler4) [] BaseAsyncTask::onTaskEndSuccess: Task 'e889bbb0-645e-4445-8e49-06331eeb3865' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully.
2017-04-27 18:21:19,232+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler4) [] CommandAsyncTask::endActionIfNecessary: All tasks of command '92bcd488-6a03-45c1-8a4f-e932f40e94bc' has ended -> executing 'endAction'
2017-04-27 18:21:19,232+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler4) [] CommandAsyncTask::endAction: Ending action for '1' tasks (command ID: '92bcd488-6a03-45c1-8a4f-e932f40e94bc'): calling endAction '.
2017-04-27 18:21:19,232+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-11) [] CommandAsyncTask::endCommandAction [within thread] context: Attempting to endAction 'CreateVolumeContainer',
2017-04-27 18:21:19,239+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand' successfully.
2017-04-27 18:21:19,248+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' completed, handling the result.
2017-04-27 18:21:19,248+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'CreateVolumeContainer' succeeded, clearing tasks.
2017-04-27 18:21:19,248+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] SPMAsyncTask::ClearAsyncTask: Attempting to clear task 'e889bbb0-645e-4445-8e49-06331eeb3865'
2017-04-27 18:21:19,249+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] START, SPMClearTaskVDSCommand( SPMTaskGuidBaseVDSCommandParameters:{runAsync='true', storagePoolId='d5c26715-43c0-4b8b-9edb-e7f1d689153c', ignoreFailoverLimit='false', taskId='e889bbb0-645e-4445-8e49-06331eeb3865'}), log id: 49ae97da
2017-04-27 18:21:19,249+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] START, HSMClearTaskVDSCommand(HostName = host_mixed_1, HSMTaskGuidBaseVDSCommandParameters:{runAsync='true', hostId='2d690294-ccde-4b0d-9faf-94c174a555dc', taskId='e889bbb0-645e-4445-8e49-06331eeb3865'}), log id: 7e4329dc
2017-04-27 18:21:20,268+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] FINISH, HSMClearTaskVDSCommand, log id: 7e4329dc
2017-04-27 18:21:20,268+03 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SPMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] FINISH, SPMClearTaskVDSCommand, log id: 49ae97da
2017-04-27 18:21:20,272+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] BaseAsyncTask::removeTaskFromDB: Removed task 'e889bbb0-645e-4445-8e49-06331eeb3865' from DataBase
2017-04-27 18:21:20,272+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-6-thread-11) [disks_syncAction_23987d98-0e8d-4d73] CommandAsyncTask::HandleEndActionResult [within thread]: Removing CommandMultiAsyncTasks object for entity '92bcd488-6a03-45c1-8a4f-e932f40e94bc'
2017-04-27 18:21:22,308+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler2) [disks_syncAction_23987d98-0e8d-4d73] Command 'CopyImageGroupWithData' (id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7') waiting on child command id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef' type:'CloneImageGroupVolumesStructure' to complete
2017-04-27 18:21:22,311+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler2) [disks_syncAction_23987d98-0e8d-4d73] Command 'CloneImageGroupVolumesStructure' id: '6b1559da-8631-4cbf-b26e-d1aae8fa78ef' child commands '[92bcd488-6a03-45c1-8a4f-e932f40e94bc]' executions were completed, status 'SUCCEEDED'
2017-04-27 18:21:22,323+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler2) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveImageGroup' (id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63') waiting on child command id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7' type:'CopyImageGroupWithData' to complete
2017-04-27 18:21:22,327+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler2) [disks_syncAction_23987d98-0e8d-4d73] Command 'MoveOrCopyDisk' (id: '2d553d2f-96b8-483d-a443-8f92f7afb4ff') waiting on child command id: '26ecc5b8-4ae2-4944-9f92-53f55962ac63' type:'MoveImageGroup' to complete
2017-04-27 18:21:23,332+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (DefaultQuartzScheduler8) [disks_syncAction_23987d98-0e8d-4d73] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand' successfully.
2017-04-27 18:21:24,356+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Running command: CopyImageGroupVolumesDataCommand internal: true.
2017-04-27 18:21:25,371+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] Starting child command 1 of 1, image 'f60ebc6e-059f-496d-b9bc-4edbee283706'
2017-04-27 18:21:25,391+03 INFO  [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] Running command: CopyDataCommand internal: true.
2017-04-27 18:21:25,394+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] START, GetVolumeInfoVDSCommand(HostName = host_mixed_3, GetVolumeInfoVDSCommandParameters:{runAsync='true', hostId='39610ce7-c2f3-4d11-b0be-2b00e91a0262', storagePoolId='d5c26715-43c0-4b8b-9edb-e7f1d689153c', storageDomainId='b3925f20-df6c-4884-9b81-f6902de491c6', imageGroupId='aec5ab78-da82-4268-ac49-59d2d4e4eef4', imageId='f60ebc6e-059f-496d-b9bc-4edbee283706'}), log id: 5f27d7c7
2017-04-27 18:21:26,618+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@33cff16e, log id: 5f27d7c7
2017-04-27 18:21:26,628+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CopyVolumeDataVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] START, CopyVolumeDataVDSCommand(HostName = host_mixed_1, CopyVolumeDataVDSCommandParameters:{runAsync='true', hostId='2d690294-ccde-4b0d-9faf-94c174a555dc', storageDomainId='null', jobId='9e1eb880-ecdb-4497-b1e4-8dd4cb9cddea', srcInfo='VdsmImageLocationInfo [storageDomainId=6e34c8c9-5060-479a-a12a-f768225a489f, imageGroupId=aec5ab78-da82-4268-ac49-59d2d4e4eef4, imageId=f60ebc6e-059f-496d-b9bc-4edbee283706, generation=null]', dstInfo='VdsmImageLocationInfo [storageDomainId=b3925f20-df6c-4884-9b81-f6902de491c6, imageGroupId=aec5ab78-da82-4268-ac49-59d2d4e4eef4, imageId=f60ebc6e-059f-496d-b9bc-4edbee283706, generation=0]', collapse='false'}), log id: 2675af56
2017-04-27 18:21:26,628+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CopyVolumeDataVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] -- executeVdsBrokerCommand: calling 'copyVolumeData'
2017-04-27 18:21:26,991+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CopyVolumeDataVDSCommand] (DefaultQuartzScheduler3) [disks_syncAction_23987d98-0e8d-4d73] FINISH, CopyVolumeDataVDSCommand, log id: 2675af56
2017-04-27 18:21:28,004+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command 'CopyImageGroupWithData' (id: '1923c6a9-0d4e-421c-aa20-bf9c87c215e7') waiting on child command id: '7a364022-650b-4a69-b632-b97fb9bf65c7' type:'CopyImageGroupVolumesData' to complete
2017-04-27 18:21:28,008+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] START, GetHostJobsVDSCommand(HostName = host_mixed_1, GetHostJobsVDSCommandParameters:{runAsync='true', hostId='2d690294-ccde-4b0d-9faf-94c174a555dc', type='storage', jobIds='[9e1eb880-ecdb-4497-b1e4-8dd4cb9cddea]'}), log id: 42e9cf83
2017-04-27 18:21:28,269+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] FINISH, GetHostJobsVDSCommand, return: {9e1eb880-ecdb-4497-b1e4-8dd4cb9cddea=HostJobInfo:{id='9e1eb880-ecdb-4497-b1e4-8dd4cb9cddea', type='storage', description='copy_data', status='failed', progress='null', error='VDSError:{code='AcquireLockFailure', message='Cannot obtain lock: "id=b3925f20-df6c-4884-9b81-f6902de491c6, rc=28, out=Cannot acquire Lease(name='f60ebc6e-059f-496d-b9bc-4edbee283706', path='/dev/b3925f20-df6c-4884-9b81-f6902de491c6/leases', offset=112197632), err=(28, 'Sanlock resource not acquired', 'No space left on device')"'}'}}, log id: 42e9cf83
2017-04-27 18:21:28,273+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command CopyData id: 'e73550b9-31c3-49d6-84a4-f94563a7706d': job '9e1eb880-ecdb-4497-b1e4-8dd4cb9cddea' execution was completed with VDSM job status 'failed'
2017-04-27 18:21:28,276+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler6) [disks_syncAction_23987d98-0e8d-4d73] Command CopyData id: 'e73550b9-31c3-49d6-84a4-f94563a7706d': execution was completed, the command status is 'FAILED'

Comment 18 Raz Tamir 2017-05-24 13:48:51 UTC
Also fails cold merge operations

engine.log:

2017-05-22 19:09:02,822+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler4) [snapshots_delete_6d3ae6f3-2e39-487b] Command 'RemoveSnapshot' (id: '1b95e34b-fda2-4955-91f7-ade1b4fe6835') waiting on child command id: 'ca04ae0f-7270-4698-b321-e21397601a6d' type:'ColdMergeSnapshotSingleDisk' to complete
2017-05-22 19:09:02,826+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler4) [snapshots_delete_6d3ae6f3-2e39-487b] Command 'ColdMergeSnapshotSingleDisk' (id: 'ca04ae0f-7270-4698-b321-e21397601a6d') waiting on child command id: 'ef70a801-39d4-4995-ad74-78c7782d7ca7' type:'PrepareMerge' to complete
2017-05-22 19:09:04,971+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler1) [] Polling and updating Async Tasks: 7 tasks, 5 tasks to poll now
2017-05-22 19:09:05,994+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler1) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-05-22 19:09:06,001+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler1) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: Cannot obtain lock
2017-05-22 19:09:06,001+03 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [] SPMAsyncTask::PollTask: Polling task 'b664f3cc-0548-440a-83a4-0f4a9c502904' (Parent Command 'PrepareMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2017-05-22 19:09:06,005+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler1) [] BaseAsyncTask::logEndTaskFailure: Task 'b664f3cc-0548-440a-83a4-0f4a9c502904' (Parent Command 'PrepareMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot obtain lock, code = 651',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Cannot obtain lock, code = 651'
2017-05-22 19:09:06,006+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (DefaultQuartzScheduler1) [] CommandAsyncTask::endActionIfNecessary: All tasks of command 'ef70a801-39d4-4995-ad74-78c7782d7ca7' has ended -> executing 'endAction'

vdsm.log

2017-05-22 19:09:00,332+0300 DEBUG (tasks/3) [storage.ResourceManager] Trying to release resource '00_storage.82cba7ce-8693-46b2-94ae-610993913b6c' (resourceManager:566)
2017-05-22 19:09:00,332+0300 DEBUG (tasks/3) [storage.ResourceManager] Released resource '00_storage.82cba7ce-8693-46b2-94ae-610993913b6c' (1 active users) (resourceManager:585)
2017-05-22 19:09:00,333+0300 ERROR (tasks/3) [storage.TaskManager.Task] (Task='b664f3cc-0548-440a-83a4-0f4a9c502904') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 333, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1884, in prepareMerge
    merge.prepare(subchainInfo)
  File "/usr/share/vdsm/storage/merge.py", line 181, in prepare
    with guarded.context(subchain.locks):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__
    six.reraise(*exc)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__
    lock.acquire()
  File "/usr/share/vdsm/storage/volume.py", line 1418, in acquire
    dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id)
  File "/usr/share/vdsm/storage/sd.py", line 471, in acquireVolumeLease
    self._domainLock.acquire(hostId, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire
    "Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock: u"id=82cba7ce-8693-46b2-94ae-610993913b6c, rc=28, out=Cannot acquire Lease(name='56483875-9204-4b1b-9656-de2d1414ef51', path=u'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__1__nfs__1/82cba7ce-8693-46b2-94ae-610993913b6c/images/8f58c5f8-05e7-48c1-8fd8-43c4087f0053/56483875-9204-4b1b-9656-de2d1414ef51.lease', offset=0), err=(28, 'Sanlock resource not acquired', 'No space left on device')"
2017-05-22 19:09:00,333+0300 DEBUG (tasks/3) [storage.TaskManager.Task] (Task='b664f3cc-0548-440a-83a4-0f4a9c502904') Task._run: b664f3cc-0548-440a-83a4-0f4a9c502904 () {} failed - stopping task (task:889)

** Logs attached.

Comment 19 Raz Tamir 2017-05-24 13:49:46 UTC
Created attachment 1281970 [details]
engine and vdsm.log cold merge

Comment 20 Maor 2017-06-04 10:17:26 UTC
Before any job is running using a random HSM we need to make sure that this HSM has acquired a host id on the storage domain.
There are different ways to handle this issue:
1. Through the Engine -
Monitoring - Every time a Host will be activated or a storage domain will be moved from unknown/inactive state to active the monitoring takes place, we can change the monitoring to also take in consider the acquired flag returned from repoStats
Regarding activate, there is a known issue which described in BZ1043047 - We can add a memory matrix that will map acquire flag of Host to Storage domains so once a job will be executed the engine will filter only hosts which has acquire flag to this storage domain.

2. Through VDSM - Use the cluster lock module so every lease acquire will automatically wait for the host id if it was not acquired yet.
Acquire will wait on the event in cluster lock so the event will be set in hasHostId() and in acquireHostID

Comment 21 Nir Soffer 2017-06-05 05:58:10 UTC
Raz, can you test a proposed solution on vdsm side?
https://gerrit.ovirt.org/77765

Comment 22 Red Hat Bugzilla Rules Engine 2017-06-05 12:33:21 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 23 Raz Tamir 2017-06-06 08:17:21 UTC
*** Bug 1459058 has been marked as a duplicate of this bug. ***

Comment 24 Eyal Shenitzky 2017-06-06 08:28:22 UTC
The issue also occurs on Live storage migration:

VDSM:

2017-06-05 11:54:46,591+0300 INFO  (jsonrpc/2) [vdsm.api] START sdm_copy_data(job_id='12fa9df4-9357-42de-9950-a83d3dd02533', source={'img_id': '2cc28ab4-7e16-4106-b8f2-00e49c64a29c', 'sd_id': '28df2ec8-431c-4f90-b18a-58d9d13882d7', 'endpoint_type': 'div', 'vol_id': 'c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1'}, destination={'generation': 0, 'img_id': '2cc28ab4-7e16-4106-b8f2-00e49c64a29c', 'sd_id': '45f6b9b4-6b94-44db-a740-2494f05bffa4', 'endpoint_type': 'div', 'vol_id': 'c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1'}) from=::ffff:10.35.69.13,33844, flow_id=disks_syncAction_ac2ff124-a7a0-47ba (api:46)
2017-06-05 11:54:46,592+0300 INFO  (jsonrpc/2) [vdsm.api] FINISH sdm_copy_data return=None from=::ffff:10.35.69.13,33844, flow_id=disks_syncAction_ac2ff124-a7a0-47ba (api:52)
2017-06-05 11:54:46,592+0300 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call SDM.copy_data succeeded in 0.00 seconds (__init__:539)
2017-06-05 11:54:46,593+0300 INFO  (tasks/9) [storage.ThreadPool.WorkerThread] START task b31cd91b-f35b-4f15-95b5-5de63808760f (cmd=<bound method Task.commit of <storage.task.Task instance at 0x7f71e4244440>>, args=None) (threadPool:208)
2017-06-05 11:54:46,593+0300 INFO  (tasks/9) [root] Running job '12fa9df4-9357-42de-9950-a83d3dd02533'... (jobs:179)
2017-06-05 11:54:46,820+0300 INFO  (tasks/9) [storage.LVM] Refreshing lvs: vg=45f6b9b4-6b94-44db-a740-2494f05bffa4 lvs=['leases'] (lvm:1291)
2017-06-05 11:54:46,821+0300 INFO  (tasks/9) [storage.LVM] Refreshing LVs (vg=45f6b9b4-6b94-44db-a740-2494f05bffa4, lvs=['leases']) (lvm:1319)
2017-06-05 11:54:46,929+0300 INFO  (tasks/9) [storage.SANLock] Acquiring Lease(name='c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1', path='/dev/45f6b9b4-6b94-44db-a740-2494f05bffa4/leases', offset=113246208) for host id 1 (clusterlock:343)
2017-06-05 11:54:46,930+0300 ERROR (tasks/9) [storage.guarded] Error acquiring lock <VolumeLease ns=04_lease_45f6b9b4-6b94-44db-a740-2494f05bffa4, name=c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1, mode=exclusive at 0x3acaf10> (guarded:96)
2017-06-05 11:54:46,931+0300 ERROR (tasks/9) [root] Job '12fa9df4-9357-42de-9950-a83d3dd02533' failed (jobs:217)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in run
    self._run()
  File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 64, in _run
    with guarded.context(self._source.locks + self._dest.locks):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__
    six.reraise(*exc)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__
    lock.acquire()
  File "/usr/share/vdsm/storage/volume.py", line 1418, in acquire
    dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id)
  File "/usr/share/vdsm/storage/sd.py", line 471, in acquireVolumeLease
    self._domainLock.acquire(hostId, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 362, in acquire
    "Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock: "id=45f6b9b4-6b94-44db-a740-2494f05bffa4, rc=28, out=Cannot acquire Lease(name='c5a08ac5-20bc-4d44-9f4c-bc6a53d661d1', path='/dev/45f6b9b4-6b94-44db-a740-2494f05bffa4/leases', offset=113246208), err=(28, 'Sanlock resource not acquired', 'No space left on device')"
2017-06-05 11:54:46,932+0300 INFO  (tasks/9) [root] Job '12fa9df4-9357-42de-9950-a83d3dd02533' will be deleted in 3600 seconds (jobs:245)


Move disk can failed with error massage in the engine - 


2017-06-05 11:54:49,428+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler6) [disks_syncAction_ac2ff124-a7a0-47ba] Command CopyData id: 'c9803db7-9802-4a0d-80a3-a22fc42dd19f': execution was completed, the command status is 'FAILED'
2017-06-05 11:54:49,432+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler6) [disks_syncAction_ac2ff124-a7a0-47ba] Command 'CopyImageGroupWithData' (id: 'e774b141-5221-488f-b621-853ca8a7e78d') waiting on child command id: 'd413e773-b740-4570-96ad-d0df0e5f311f' type:'CopyImageGroupVolumesData' to complete
2017-06-05 11:54:50,439+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler9) [disks_syncAction_ac2ff124-a7a0-47ba] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure.

Comment 25 Maor 2017-06-13 08:47:46 UTC
We are currently testing the following solution in VDSM:
  https://gerrit.ovirt.org/#/c/77765/

Once we will get feedback on the automation tests' result we will know whether to move the bug to post or not.

Comment 26 Raz Tamir 2017-06-16 11:56:08 UTC
Removing the needinfo for now.
Please set it again when there's a patch we can test

Comment 27 Allon Mureinik 2017-06-18 11:03:38 UTC
(In reply to Raz Tamir from comment #26)
> Removing the needinfo for now.
> Please set it again when there's a patch we can test

Still the same patch:
(In reply to Maor from comment #25)
> We are currently testing the following solution in VDSM:
>   https://gerrit.ovirt.org/#/c/77765/
> 
> Once we will get feedback on the automation tests' result we will know
> whether to move the bug to post or not.

Not sure what removing the needinfo means.
Was this tested and failed? Some other feedback?

Comment 28 Nir Soffer 2017-06-18 19:17:54 UTC
(In reply to Allon Mureinik from comment #27)
> (In reply to Raz Tamir from comment #26)
> > Removing the needinfo for now.
> > Please set it again when there's a patch we can test
> 
> Still the same patch:
> (In reply to Maor from comment #25)
> > We are currently testing the following solution in VDSM:
> >   https://gerrit.ovirt.org/#/c/77765/
> > 
> > Once we will get feedback on the automation tests' result we will know
> > whether to move the bug to post or not.
> 
> Not sure what removing the needinfo means.
> Was this tested and failed? Some other feedback?

Last time master was broken because of the persistent vm change. We did not see
the the errors related to this bug, but some error may be masked by persistent
vms failures.

The patches are ready now for testing again:
https://gerrit.ovirt.org/77765

(Note, you must checkout this patch, not cherry-pick)

Comment 29 Raz Tamir 2017-06-28 13:36:43 UTC
Execution finished - cold move disk flow:
We are still seeing the same error in the logs
engine.log:

2017-06-28 14:06:02,177+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CopyVolumeDataVDSCommand] (DefaultQuartzScheduler2) [disks_syncAction_dbf04eaf-c0d0-461b] FINISH, CopyVolumeDataVDSCommand, log id: 315b11
46
2017-06-28 14:06:02,187+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler2) [disks_syncAction_dbf04eaf-c0d0-461b] START, GetHostJobsVDSCommand(HostName = host_mixed_1, GetHostJobsVDSCommandParameters:{runAsync='true', hostId='ba644554-9f4b-4bc9-98c3-3a7ea9b9b6ff', type='storage', jobIds='[76e45363-e748-4ef3-acd3-5a0aaf38085f]'}), log id: 1bb99606
2017-06-28 14:06:02,190+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler2) [disks_syncAction_dbf04eaf-c0d0-461b] FINISH, GetHostJobsVDSCommand, return: {76e45363-e748-4ef3-acd3-5a0aaf38085f=HostJobInfo:{id='76e45363-e748-4ef3-acd3-5a0aaf38085f', type='storage', description='copy_data', status='running', progress='null', error='null'}}, log id: 1bb99606
2017-06-28 14:06:02,193+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler2) [disks_syncAction_dbf04eaf-c0d0-461b] Command CopyData id: 'bfc5a129-c8f3-485a-ae5f-8da8c6cbdfb7': waiting for job '76e45363-e748-4ef3-acd3-5a0aaf38085f' on host 'host_mixed_1' (id: 'ba644554-9f4b-4bc9-98c3-3a7ea9b9b6ff') to complete
2017-06-28 14:06:03,198+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler4) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'CopyImageGroupWithData' (id: '8a5db46c-2514-4b59-bf55-2e916420d97e') waiting on child command id: '91692eae-33b1-4089-936a-ff62b8eaeb8e' type:'CopyImageGroupVolumesData' to complete
2017-06-28 14:06:04,201+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler9) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'CopyImageGroupVolumesData' (id: '91692eae-33b1-4089-936a-ff62b8eaeb8e') waiting on child command id: 'bfc5a129-c8f3-485a-ae5f-8da8c6cbdfb7' type:'CopyData' to complete
2017-06-28 14:06:04,205+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler9) [disks_syncAction_dbf04eaf-c0d0-461b] START, GetHostJobsVDSCommand(HostName = host_mixed_1, GetHostJobsVDSCommandParameters:{runAsync='true', hostId='ba644554-9f4b-4bc9-98c3-3a7ea9b9b6ff', type='storage', jobIds='[76e45363-e748-4ef3-acd3-5a0aaf38085f]'}), log id: dcbc7f0
2017-06-28 14:06:04,208+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (DefaultQuartzScheduler9) [disks_syncAction_dbf04eaf-c0d0-461b] FINISH, GetHostJobsVDSCommand, return: {76e45363-e748-4ef3-acd3-5a0aaf38085f=HostJobInfo:{id='76e45363-e748-4ef3-acd3-5a0aaf38085f', type='storage', description='copy_data', status='failed', progress='null', error='VDSError:{code='AcquireLockFailure', message='Cannot obtain lock: "id=34cacfb6-fc56-4e66-8db4-7dcdb387c310, rc=28, out=Cannot acquire Lease(name='57562b48-e0dc-4373-b974-e87fbfea8889', path='/dev/34cacfb6-fc56-4e66-8db4-7dcdb387c310/leases', offset=109051904), err=(28, 'Sanlock resource not acquired', 'No space left on device')"'}'}}, log id: dcbc7f0
2017-06-28 14:06:04,210+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler9) [disks_syncAction_dbf04eaf-c0d0-461b] Command CopyData id: 'bfc5a129-c8f3-485a-ae5f-8da8c6cbdfb7': job '76e45363-e748-4ef3-acd3-5a0aaf38085f' execution was completed with VDSM job status 'failed'
2017-06-28 14:06:04,213+03 INFO  [org.ovirt.engine.core.bll.StorageJobCallback] (DefaultQuartzScheduler9) [disks_syncAction_dbf04eaf-c0d0-461b] Command CopyData id: 'bfc5a129-c8f3-485a-ae5f-8da8c6cbdfb7': execution was completed, the command status is 'FAILED'
2017-06-28 14:06:05,219+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (DefaultQuartzScheduler7) [disks_syncAction_dbf04eaf-c0d0-461b] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure.
2017-06-28 14:06:06,229+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler10) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'CopyImageGroupVolumesData' id: '91692eae-33b1-4089-936a-ff62b8eaeb8e' child commands '[bfc5a129-c8f3-485a-ae5f-8da8c6cbdfb7]' executions were completed, status 'FAILED'
2017-06-28 14:06:07,236+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand] (DefaultQuartzScheduler8) [disks_syncAction_dbf04eaf-c0d0-461b] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand' with failure.
2017-06-28 14:06:07,244+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'CopyImageGroupWithData' id: '8a5db46c-2514-4b59-bf55-2e916420d97e' child commands '[891f957d-ac2d-4694-8155-b93c99158cf1, 91692eae-33b1-4089-936a-ff62b8eaeb8e]' executions were completed, status 'FAILED'
2017-06-28 14:06:08,260+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (DefaultQuartzScheduler10) [disks_syncAction_dbf04eaf-c0d0-461b] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand' with failure.
2017-06-28 14:06:09,287+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler1) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'MoveImageGroup' id: '548fc410-fe4b-4cd3-a15f-e684bf402532' child commands '[8a5db46c-2514-4b59-bf55-2e916420d97e]' executions were completed, status 'FAILED'
2017-06-28 14:06:09,287+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler1) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'MoveImageGroup' id: '548fc410-fe4b-4cd3-a15f-e684bf402532' Updating status to 'FAILED', The command end method logic will be executed by one of its parent commands.
2017-06-28 14:06:10,301+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler4) [disks_syncAction_dbf04eaf-c0d0-461b] Command 'MoveOrCopyDisk' id: '06a9e571-c636-41bc-bebd-74e6c9e4efbe' child commands '[548fc410-fe4b-4cd3-a15f-e684bf402532]' executions were completed, status 'FAILED'
2017-06-28 14:06:11,037+03 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler1) [7e93839a] Setting new tasks map. The map contains now 2 tasks
2017-06-28 14:06:11,314+03 ERROR [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (DefaultQuartzScheduler8) [disks_syncAction_dbf04eaf-c0d0-461b] Ending command 'org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand' with failure.
2017-06-28 14:06:11,321+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.MoveImageGroupCommand] (DefaultQuartzScheduler8) [disks_syncAction_dbf04eaf-c0d0-461b] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.MoveImageGroupCommand' with failure.

vdsm.log:

/dev/34cacfb6-fc56-4e66-8db4-7dcdb387c310/leases', offset=109051904) for host id 1 (
clusterlock:342)
2017-06-28 14:06:02,605+0300 ERROR (tasks/4) [storage.guarded] Error acquiring lock <VolumeLease ns=04_lease_34cacfb6-fc56-4e66-8db4-7dcdb387c310, name=57562b48-e0dc-4373-b974-e87fbfea8889, mode=exclusive at 0x22fe
f50> (guarded:96)
2017-06-28 14:06:02,606+0300 ERROR (tasks/4) [root] Job u'76e45363-e748-4ef3-acd3-5a0aaf38085f' failed (jobs:221)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 61, in _run
    with guarded.context(self._source.locks + self._dest.locks):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 102, in __enter__
    six.reraise(*exc)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/guarded.py", line 93, in __enter__
    lock.acquire()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1519, in acquire
    dom.acquireVolumeLease(self._host_id, self._img_id, self._vol_id)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 478, in acquireVolumeLease
    self._domainLock.acquire(hostId, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 361, in acquire
    "Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock: "id=34cacfb6-fc56-4e66-8db4-7dcdb387c310, rc=28, out=Cannot acquire Lease(name='57562b48-e0dc-4373-b974-e87fbfea8889', path='/dev/34cacfb6-fc56-4e66-8db4-7dcdb387c310/leases'
, offset=109051904), err=(28, 'Sanlock resource not acquired', 'No space left on device')"

Comment 30 Raz Tamir 2017-06-28 13:37:27 UTC
Created attachment 1292653 [details]
logs with patch

Comment 31 Nir Soffer 2017-06-28 15:48:22 UTC
Raz, can you share the output of
rpm -q vdsm

Comment 32 Raz Tamir 2017-06-28 16:03:51 UTC
The environment was already reprovisioned so I don't have this info anymore

Comment 33 Nir Soffer 2017-06-28 16:11:00 UTC
The line numbers in the traceback do not match the code. It looks like this test was using only the first patch or master. 

Please make sure you run versio d0e95ceef2.

Comment 34 Raz Tamir 2017-06-28 18:09:18 UTC
(In reply to Nir Soffer from comment #31)
> Raz, can you share the output of
> rpm -q vdsm

Found it - vdsm-4.20.1-77.git6045110.el7.x86_64

Is That the right version?

Comment 35 Nir Soffer 2017-06-28 19:01:30 UTC
(In reply to Raz Tamir from comment #34)
> (In reply to Nir Soffer from comment #31)
> > Raz, can you share the output of
> > rpm -q vdsm
> 
> Found it - vdsm-4.20.1-77.git6045110.el7.x86_64
> 
> Is That the right version?

NO, this is patch from master that does not include the fix.
https://gerrit.ovirt.org/78386

The patch we need to test is:
https://gerrit.ovirt.org/77765

You must checkout this patch, not cherry-pick, you need the 4 patches in this
series.

The version should be: vdsm-4.20.1-91.gitd0e95ce.el7.x86_64

Comment 37 Nir Soffer 2017-07-06 18:24:40 UTC
I'm removing the test infrastructure patch from the bug, since the tests infra has
change too much from 4.1, and we cannot backport the tests.

Comment 38 Maor 2017-07-20 07:18:59 UTC
Nir, can you please add the bug URL that blocks this bug from being merged downstream

Comment 39 Nir Soffer 2017-07-22 22:46:39 UTC
We want to resolve first bug 1469175.

Comment 40 Raz Tamir 2017-08-01 11:35:12 UTC
Raising severity as on our last tier 2, 80% of the failures are because of this bug

Comment 41 Maor 2017-08-01 12:29:02 UTC
I think that marking it as urgent is pointless since the bug is blocked on BZ1469175 so there is not much we can do other than check  BZ1469175 from time to time.

Comment 42 Allon Mureinik 2017-08-01 13:53:15 UTC
(In reply to Maor from comment #41)
> I think that marking it as urgent is pointless since the bug is blocked on
> BZ1469175 so there is not much we can do other than check  BZ1469175 from
> time to time.

Bug 1469175 is marked as MODIFIED, so we can go ahead and fix this one now.

Comment 43 Maor 2017-08-01 14:40:56 UTC
(In reply to Allon Mureinik from comment #42)
> (In reply to Maor from comment #41)
> > I think that marking it as urgent is pointless since the bug is blocked on
> > BZ1469175 so there is not much we can do other than check  BZ1469175 from
> > time to time.
> 
> Bug 1469175 is marked as MODIFIED, so we can go ahead and fix this one now.

rebased the backport patches, please review:
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:ovirt-4.1+topic:wait-for-host-id

Comment 44 Raz Tamir 2017-08-14 12:14:06 UTC
No trace of this bug in our automation - tier 1/2/3 on latest build ovirt-engine-4.1.5.2-0.1.el7.noarch


Note You need to log in before you can comment on or make changes to this bug.