Bug 1529990

Summary: Error creating a new volume on snapshot creation with 'Failed to get shared "write"'
Product: Red Hat Enterprise Linux 7 Reporter: Raz Tamir <ratamir>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.5CC: ailan, amureini, bugs, chayang, coli, juzhang, michal.skrivanek, michen, nsoffer, pingl, qizhu, qzhang, ratamir, rbalakri, virt-maint, yhong, yilzhang
Target Milestone: rcKeywords: Automation, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-03 09:22:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1525303    
Attachments:
Description Flags
engine and vdsm logs none

Description Raz Tamir 2017-12-31 17:12:53 UTC
Created attachment 1375012 [details]
engine and vdsm logs

Description of problem:
On our automation tests (Tier1) with RHEL 7.5, I see that snapshot creation failed with:

engine.log:

2017-12-31 16:02:00,329+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-12-31 16:02:00,334+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host
_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume: ('Volume creation 68fe6781-8b6f-424b-b71d-ce0d37cb5c5e failed: Cannot clone volume: u\'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.
tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/b3bd29e2-3649-4c58-9f94-1132964d9b63, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.c
om:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Command [\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qco
w2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', \\\'b3bd29e2-3649-4c58-9f94-1132964d9b63\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99
523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e\\\'] failed with rc=1 out=\\\'\\\' err=\\\'qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.r
edhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Failed to get shared "write" lock\\\\nIs another process using t
he image?\\\\nCould not open backing image to determine size.\\\\n\\\'\'',)
2017-12-31 16:02:00,334+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-12-31 16:02:00,336+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume: ('Volume creation 2ab42022-6d95-45fc-a28d-651fe5e4473d failed: Cannot clone volume: u\'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/6dfe3e42-9a63-40d6-8d7d-23f1fccb7d91/64bb1241-dae1-4c2a-9209-40ce5d42fa92, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/6dfe3e42-9a63-40d6-8d7d-23f1fccb7d91/2ab42022-6d95-45fc-a28d-651fe5e4473d: Command [\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', \\\'64bb1241-dae1-4c2a-9209-40ce5d42fa92\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/6dfe3e42-9a63-40d6-8d7d-23f1fccb7d91/2ab42022-6d95-45fc-a28d-651fe5e4473d\\\'] failed with rc=1 out=\\\'\\\' err=\\\'qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/6dfe3e42-9a63-40d6-8d7d-23f1fccb7d91/2ab42022-6d95-45fc-a28d-651fe5e4473d: Failed to get shared "write" lock\\\\nIs another process using the image?\\\\nCould not open backing image to determine size.\\\\n\\\'\'',)
2017-12-31 16:02:00,337+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-12-31 16:02:00,339+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume: ('Volume creation 02aff061-8e57-44ef-b458-3d52d7fa7b61 failed: Cannot clone volume: u\'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/5a2bd447-f717-4d5c-8ba5-7cdea32b79d4/12b71d75-3493-4230-8e7d-66d748ad4089, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/5a2bd447-f717-4d5c-8ba5-7cdea32b79d4/02aff061-8e57-44ef-b458-3d52d7fa7b61: Command [\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', \\\'12b71d75-3493-4230-8e7d-66d748ad4089\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/5a2bd447-f717-4d5c-8ba5-7cdea32b79d4/02aff061-8e57-44ef-b458-3d52d7fa7b61\\\'] failed with rc=1 out=\\\'\\\' err=\\\'qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/5a2bd447-f717-4d5c-8ba5-7cdea32b79d4/02aff061-8e57-44ef-b458-3d52d7fa7b61: Failed to get shared "write" lock\\\\nIs another process using the image?\\\\nCould not open backing image to determine size.\\\\n\\\'\'',)
2017-12-31 16:02:00,339+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] SPMAsyncTask::PollTask: Polling task '866b19ce-b079-4264-ac56-f368a658567f' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2017-12-31 16:02:00,343+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-96) [] BaseAsyncTask::logEndTaskFailure: Task '866b19ce-b079-4264-ac56-f368a658567f' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume: ('Volume creation 68fe6781-8b6f-424b-b71d-ce0d37cb5c5e failed: Cannot clone volume: u\'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/b3bd29e2-3649-4c58-9f94-1132964d9b63, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Command [\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', \\\'b3bd29e2-3649-4c58-9f94-1132964d9b63\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e\\\'] failed with rc=1 out=\\\'\\\' err=\\\'qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Failed to get shared "write" lock\\\\nIs another process using the image?\\\\nCould not open backing image to determine size.\\\\n\\\'\'',), code = 205',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume: ('Volume creation 68fe6781-8b6f-424b-b71d-ce0d37cb5c5e failed: Cannot clone volume: u\'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/b3bd29e2-3649-4c58-9f94-1132964d9b63, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Command [\\\'/usr/bin/qemu-img\\\', \\\'create\\\', \\\'-f\\\', \\\'qcow2\\\', \\\'-o\\\', \\\'compat=1.1\\\', \\\'-b\\\', \\\'b3bd29e2-3649-4c58-9f94-1132964d9b63\\\', \\\'-F\\\', \\\'qcow2\\\', u\\\'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e\\\'] failed with rc=1 out=\\\'\\\' err=\\\'qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Failed to get shared "write" lock\\\\nIs another process using the image?\\\\nCould not open backing image to determine size.\\\\n\\\'\'',), code = 205'


vdsm.log:

2017-12-31 16:01:53,305+0200 ERROR (tasks/0) [storage.Volume] cannot clone image d4e78894-e79c-42fd-8f39-2d35e42d710a volume b3bd29e2-3649-4c58-9f94-1132964d9b63 to /rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.re
dhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e (volume:988)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 984, in clone
    backingFormat=sc.fmt2str(self.getFormat()))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 121, in create
    _run_cmd(cmd, cwd=cwdPath)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 346, in _run_cmd
    raise cmdutils.Error(cmd, rc, out, err)
Error: Command ['/usr/bin/qemu-img', 'create', '-f', 'qcow2', '-o', 'compat=1.1', '-b', 'b3bd29e2-3649-4c58-9f94-1132964d9b63', '-F', 'qcow2', u'/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__
4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e'] failed with rc=1 out='' err='qemu-img: /rhev/data-center/mnt/vserver-nas01.qa.lab.tl
v.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Failed to get shared "write" lock\nIs another process using t
he image?\nCould not open backing image to determine size.\n'
2017-12-31 16:01:53,311+0200 DEBUG (jsonrpc/4) [storage.TaskManager.Task] (Task='83256f85-c1ab-4402-8943-5794c43d4a31') _save: orig /rhev/data-center/46b93857-336f-427c-8d48-75c9f020d280/mastersd/master/tasks/83256
f85-c1ab-4402-8943-5794c43d4a31 temp /rhev/data-center/46b93857-336f-427c-8d48-75c9f020d280/mastersd/master/tasks/83256f85-c1ab-4402-8943-5794c43d4a31.temp (task:759)
2017-12-31 16:01:53,314+0200 ERROR (tasks/0) [storage.Volume] Unexpected error (volume:1199)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1156, in create
    initialSize=initialSize)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileVolume.py", line 468, in _create
    volParent.clone(volPath, volFormat)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 993, in clone
    raise se.CannotCloneVolume(self.volumePath, dstPath, str(e))
CannotCloneVolume: Cannot clone volume: u'src=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/b3bd29e2
-3649-4c58-9f94-1132964d9b63, dst=/rhev/data-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b
71d-ce0d37cb5c5e: Command [\'/usr/bin/qemu-img\', \'create\', \'-f\', \'qcow2\', \'-o\', \'compat=1.1\', \'-b\', \'b3bd29e2-3649-4c58-9f94-1132964d9b63\', \'-F\', \'qcow2\', u\'/rhev/data-center/mnt/vserver-nas01.q
a.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e\'] failed with rc=1 out=\'\' err=\'qemu-img: /rhev/dat
a-center/mnt/vserver-nas01.qa.lab.tlv.redhat.com:_nas01_ge__4__nfs__0/62f99523-5b06-4af0-a8c4-21b17f3fbcc2/images/d4e78894-e79c-42fd-8f39-2d35e42d710a/68fe6781-8b6f-424b-b71d-ce0d37cb5c5e: Failed to get shared "wri
te" lock\\nIs another process using the image?\\nCould not open backing image to determine size.\\n\''



Version-Release number of selected component (if applicable):
vdsm-4.20.9.3-1.el7ev.x86_64
qemu-img-rhev-2.10.0-13.el7.x86_64
libvirt-3.9.0-6.el7.x86_64
rhvm-4.2.0.2-0.1.el7.noarch

How reproducible:
Not sure

Steps to Reproduce:
1. Create snapshot for VM with 4 disks
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Yaniv Kaul 2018-01-01 07:47:07 UTC
Raz, this is clearly a qemu-img regression. Please move to the right component.

Comment 4 Yaniv Kaul 2018-01-01 09:12:28 UTC
(In reply to Yaniv Kaul from comment #3)
> Raz, this is clearly a qemu-img regression. Please move to the right
> component.

Whoops - looks like the right component.

Comment 6 Nir Soffer 2018-01-01 13:35:06 UTC
This looks like qemu-img regression, but we don't have enough info on the error:

    Failed to get shared "write" lock\\nIs another process using the image?\\nCould not open backing image to determine size.\\n\''

The error should include the underlying error from the system. Does it mean the file
was not found, or the file exists but could not be locked?

Adding the value of strerror(errno) will make it easier to support.

Comment 7 Nir Soffer 2018-01-01 13:38:43 UTC
Raz, can you give more details about the flow? is this live or cold snapshot?

The bug does not include the minimal info for reproducing.

Comment 8 Raz Tamir 2018-01-01 14:16:30 UTC
(In reply to Nir Soffer from comment #7)
> Raz, can you give more details about the flow? is this live or cold snapshot?
> 
> The bug does not include the minimal info for reproducing.

Every snapshot creation failed in the execution (tier1) this bug was found in

Steps to Reproduce:
1. Create snapshot for VM with 4 disks

* why 4 disks? :
We are creating disks from all valid combinations of:
- interface
- format
- allocation-policy

Per storage type (block, file)

Comment 9 Michal Skrivanek 2018-01-01 14:49:46 UTC
(In reply to Yaniv Kaul from comment #4)
> (In reply to Yaniv Kaul from comment #3)
> > Raz, this is clearly a qemu-img regression. Please move to the right
> > component.
> 
> Whoops - looks like the right component.

This one's better

Comment 10 Nir Soffer 2018-01-01 14:54:00 UTC
(In reply to Raz Tamir from comment #8)
> Steps to Reproduce:
> 1. Create snapshot for VM with 4 disks

The vm is running when we create the snapshot?

Comment 11 Raz Tamir 2018-01-01 15:18:40 UTC
(In reply to Nir Soffer from comment #10)
> (In reply to Raz Tamir from comment #8)
> > Steps to Reproduce:
> > 1. Create snapshot for VM with 4 disks
> 
> The vm is running when we create the snapshot?

Yes,

Missed that, it was live snapshot

Comment 12 Ping Li 2018-01-02 05:14:25 UTC
The bug should be a duplicate of bug 1526212

Comment 13 Ping Li 2018-01-03 09:22:23 UTC

*** This bug has been marked as a duplicate of bug 1526212 ***