Bug 1431613

Summary: CopyImageError when cloning vm from snapshot on iscsi
Product: [oVirt] ovirt-engine Reporter: Lilach Zitnitski <lzitnits>
Component: BLL.StorageAssignee: Fred Rolland <frolland>
Status: CLOSED DUPLICATE QA Contact: Lilach Zitnitski <lzitnits>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.1.4CC: bugs, lzitnits, tnisan
Target Milestone: ovirt-4.1.2Keywords: Regression
Target Release: ---Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-28 08:50:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
CREATE request
none
logs
none
logs 2 none

Description Lilach Zitnitski 2017-03-13 12:29:15 UTC
Description of problem:
Cloning vm from snapshot on iscsi disks fails with CopyImageError and low level image copy errors. 
When running the same tests with snapshot on nfs disks there are no failures, and when trying to clone vm from snapshot manually there are no failures as well. 
The problem occurs when running this test in our automation tests with Create vm request in the REST API (the request body will be attached in a file) 

Version-Release number of selected component (if applicable):
rhevm-4.1.1.4-0.1.el7.noarch
vdsm-4.19.7-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. create vm with disks from iscsi storage domain
2. create snapshot with all vm's disks
3. run from the API the create_vm request

Actual results:
clone vm is failing while trying to create it

Expected results:
the cloned vm should be created

Additional info:

engine.log

2017-03-13 11:15:05,336+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [5b35d98b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed
2017-03-13 11:15:05,340+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) [5b35d98b] BaseAsyncTask::logEndTaskFailure: Task '1ab3d889-66dd-486a-9ffb-d74d5afb1370' (Parent Command 'AddVmFromSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261'

vdsm.log

2017-03-13 11:13:47,189+0200 ERROR (tasks/9) [storage.Image] conversion failure for volume 76a3ba38-6893-4423-9806-1de99ecea882 (image:881)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 876, in copyCollapsed
    self._wait_for_qemuimg_operation(operation)
  File "/usr/share/vdsm/storage/image.py", line 137, in _wait_for_qemuimg_operation
    operation.wait_for_completion()
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 329, in wait_for_completion
    self.poll(timeout)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 324, in poll
    self.error)
QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/rhev/data-center/4f3d521b-4b6c-403a-ac0f-976ec9d952ad/6f458208-173e-4906-85ca-7d14883c8b78/images/adb30f73-49b9-413e-926d-f61aa36b9d5d/76a3ba38-6893-4423-9806-1de99ecea882', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/6f458208-173e-4906-85ca-7d14883c8b78/images/af35370b-edc4-4b8a-9d63-28710fc604aa/1f4b766f-d54a-435c-983c-3afec4587836'], ecode=1, stdout=, stderr=qemu-img: error while writing sector 2093056: No space left on device
, message=None
2017-03-13 11:13:47,190+0200 ERROR (tasks/9) [storage.Image] Unexpected error (image:894)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 882, in copyCollapsed
    raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ("cmd=['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/rhev/data-center/4f3d521b-4b6c-403a-ac0f-976ec9d952ad/6f458208-173e-4906-85ca-7d14883c8b78/images/adb30f73-49b9-413e-926d-f61aa36b9d5d/76a3ba38-6893-4423-9806-1de99ecea882', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/6f458208-173e-4906-85ca-7d14883c8b78/images/af35370b-edc4-4b8a-9d63-28710fc604aa/1f4b766f-d54a-435c-983c-3afec4587836'], ecode=1, stdout=, stderr=qemu-img: error while writing sector 2093056: No space left on device\n, message=None",)

Comment 1 Lilach Zitnitski 2017-03-13 12:30:03 UTC
Created attachment 1262429 [details]
CREATE request

Comment 2 Lilach Zitnitski 2017-03-13 12:30:32 UTC
Created attachment 1262430 [details]
logs

engine and vdsm

Comment 3 Tal Nisan 2017-03-13 14:05:27 UTC
Just to be clear, regular clone from snapshot on iSCSI works as it should but only the automation is failing?

Comment 4 Lilach Zitnitski 2017-03-13 14:08:45 UTC
(In reply to Tal Nisan from comment #3)
> Just to be clear, regular clone from snapshot on iSCSI works as it should
> but only the automation is failing?

Yes, that's why I added the REST request body because it can be reproduced only with that.

Comment 5 Yaniv Kaul 2017-03-14 09:08:10 UTC
(In reply to Lilach Zitnitski from comment #4)
> (In reply to Tal Nisan from comment #3)
> > Just to be clear, regular clone from snapshot on iSCSI works as it should
> > but only the automation is failing?
> 
> Yes, that's why I added the REST request body because it can be reproduced
> only with that.

"No space left on device" seems to hint of the issue. Please check.

Comment 6 Lilach Zitnitski 2017-03-15 05:54:50 UTC
I don't think that's the problem because when I'm trying to clone vm from the same snapshot and from the same vm and disks, manually, it's working. 
Also, when running the automated test I can see both srcVol and dstVol are 1G and the destination SD has 100G free.

Comment 7 Lilach Zitnitski 2017-03-15 05:58:45 UTC
Created attachment 1263180 [details]
logs 2

vdsm and engine logs with successful CopyImage

Comment 8 Raz Tamir 2017-03-15 07:18:21 UTC
I'm Adding the Regression keyword as this issue already happened to us in the past and fixed - https://bugzilla.redhat.com/show_bug.cgi?id=1201268

Comment 9 Red Hat Bugzilla Rules Engine 2017-03-15 07:18:28 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 10 Fred Rolland 2017-03-28 08:50:17 UTC

*** This bug has been marked as a duplicate of bug 1435967 ***