Bug 1435967 - creating template from VM with converted disk format fails only on iscsi SD - RAW to QCOW
Summary: creating template from VM with converted disk format fails only on iscsi SD -...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.19.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.1.2
: 4.19.11
Assignee: Fred Rolland
QA Contact: Avihai
URL:
Whiteboard:
: 1431613 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-26 10:29 UTC by Avihai
Modified: 2017-05-23 08:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-23 08:11:26 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
Engine , SPM + HSM logs (1.36 MB, application/x-gzip)
2017-03-26 10:29 UTC, Avihai
no flags Details
strace of qemu convert with data on source LV (2.23 MB, text/plain)
2017-04-03 14:39 UTC, Fred Rolland
no flags Details
strace of qemu convert with no data on source LV (656.08 KB, text/plain)
2017-04-03 14:40 UTC, Fred Rolland
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 74704 0 master MERGED image: calculate destination allocation size 2017-04-02 12:24:23 UTC
oVirt gerrit 74814 0 master MERGED image: change chainSizeCalc method name 2017-04-02 12:05:06 UTC
oVirt gerrit 75018 0 ovirt-4.1 MERGED image: change chainSizeCalc method name 2017-04-03 09:16:02 UTC
oVirt gerrit 75019 0 ovirt-4.1 MERGED image: calculate destination allocation size 2017-04-03 09:15:26 UTC

Description Avihai 2017-03-26 10:29:33 UTC
Created attachment 1266504 [details]
Engine , SPM + HSM logs

Description of problem:
creating a template from VM with on iscsi storage domain with change/converted disk format fails.

This issue occurs only when Target SD is iscsi.

for example:
Created a VM with thin disk & try to create a template with converted thin disk to raw with target SD iscsi-> template creation fails.


Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-4.1.1.6-0.1.el7.noarch

VDSM:
4.19.6-1


How reproducible:
100%

Steps to Reproduce:
1.create VM 
2.Add disk1 to VM -> iscsi_1 SD , preallocated & bootable , 1G size
3.Add disk2 to VM -> iscsi_1 SD , thin provisioned , 6G in size
4. Make template from VM -> change format of disk 1 from "RAW" to "QCOW" -> PressOK 

Actual results:
Template creation fails

Expected results:
Template creation should succeed .


Additional info:
Looks like the conversion of volume format is root cause of this bug.

More Notes:
1) Also tried to make template changing only disk2 from "QCOW" to "RAW" & got the same issue.
2) When I created the template without converting format of the disks template was created successfully .


Engine log:
2017-03-26 11:57:02,241+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler7) [9504eef] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-03-26 11:57:02,262+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler7) [9504eef] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command HSMGetAllTasksStatusesVDS failed: low level Image copy failed
2017-03-26 11:57:02,357+03 ERROR [org.ovirt.engine.core.bll.AddVmTemplateCommand] (org.ovirt.thread.pool-6-thread-48) [ea70a597-9d8d-4216-bff3-70c4e4179034] Ending command 'org.ovirt.engine.core.bll.AddVmTemplateCommand' with failure.
2017-03-26 11:57:02,376+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateImageTemplateCommand] (org.ovirt.thread.pool-6-thread-48) [3a09cb97] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CreateImageTemplateCommand' with failure.

VDSM log:
2017-03-26 11:56:57,970+0300 ERROR (tasks/8) [storage.Image] conversion failure for volume b0b533bc-4bd9-48c0-8799-3781e8bb1f48 (image:881)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 876, in copyCollapsed
    self._wait_for_qemuimg_operation(operation)
  File "/usr/share/vdsm/storage/image.py", line 137, in _wait_for_qemuimg_operation
    operation.wait_for_completion()
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 329, in wait_for_completion
    self.poll(timeout)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 324, in poll
    self.error)
QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-0', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', '/rhev/data-center/978f3ace-33cd-4310-aca4-
6d59c92007d6/0dc02e0f-8243-4033-a23c-c0542d1295d0/images/d62957d3-94ab-4d35-a9d3-6489ed8751fb/b0b533bc-4bd9-48c0-8799-3781e8bb1f48', '-O', 'qcow2', '-o', 'compat=1.1', '/rhev/data-center/mnt/blockSD/0dc02e0f-8243-4033-a23c-c0542d1295d0/im
ages/e3adac5e-b4d8-4d5d-a70e-454d49c132e4/21477997-23e8-470f-ac25-e284d4319cf2'], ecode=1, stdout=, stderr=qemu-img: error while writing sector 2093056: No space left on device
, message=None
2017-03-26 11:56:57,972+0300 ERROR (tasks/8) [storage.Image] Unexpected error (image:894)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 882, in copyCollapsed
    raise se.CopyImageError(str(e))

Comment 3 Allon Mureinik 2017-03-26 15:13:39 UTC
From a very superficial scrubbing, the root cause ("No space left on device") seems similar to the error in bug 1434927 and even in bug 1435813, which was encountered out in the field with RHV 4.0.6.

I wonder if we stumbled on a qemu-img issue.

Avihai - what qemu-img=rhev version are you using?
Could you downgrade it and retry the flow?

Comment 4 Avihai 2017-03-26 20:44:42 UTC
(In reply to Allon Mureinik from comment #3)
> From a very superficial scrubbing, the root cause ("No space left on
> device") seems similar to the error in bug 1434927 and even in bug 1435813,
> which was encountered out in the field with RHV 4.0.6.
> 
> I wonder if we stumbled on a qemu-img issue.
> 
> Avihai - what qemu-img=rhev version are you using?
> Could you downgrade it and retry the flow?

Alon , 
current qemu image on hosts is :
qemu-img-rhev-2.6.0-28.el7_3.8.x86_64

Please add procedure to downgrade qemu-img & to what build .

Comment 5 Allon Mureinik 2017-03-27 03:39:55 UTC
(In reply to Avihai from comment #4)
> (In reply to Allon Mureinik from comment #3)
> > From a very superficial scrubbing, the root cause ("No space left on
> > device") seems similar to the error in bug 1434927 and even in bug 1435813,
> > which was encountered out in the field with RHV 4.0.6.
> > 
> > I wonder if we stumbled on a qemu-img issue.
> > 
> > Avihai - what qemu-img=rhev version are you using?
> > Could you downgrade it and retry the flow?
> 
> Alon , 
> current qemu image on hosts is :
> qemu-img-rhev-2.6.0-28.el7_3.8.x86_64
> 
> Please add procedure to downgrade qemu-img & to what build .

Running "yum list qemu-img" should give you a list of available versions. You can then use "yum downgrade qemu-img-<version>" to downgrade it. Let's try for something older than qemu-img-rhev-2.6.0-27.el7.x86_64 that was reported in bug 1435813.

Comment 6 Fred Rolland 2017-03-27 09:20:13 UTC
> 1) Also tried to make template changing only disk2 from "QCOW" to "RAW" &
> got the same issue.

Do you have logs for this issue ?
We need to see if it is the same root cause and open a different bug for it.

Comment 7 Fred Rolland 2017-03-27 09:42:28 UTC
Can you run this command on the SPM, and provide the output
rpm -qa | grep qemu

Comment 8 Avihai 2017-03-27 13:28:45 UTC
(In reply to Fred Rolland from comment #6)
> > 1) Also tried to make template changing only disk2 from "QCOW" to "RAW" &
> > got the same issue.
> 
> Do you have logs for this issue ?
> We need to see if it is the same root cause and open a different bug for it.

1.About this section , I rechecked & this full scenario is :

make template changing of the following:
disk1 from "RAW" to "QCOW" & disk2 from "QCOW" to "RAW" .

2. About "rpm -qa | grep qemu" on SPM:

qemu-kvm-common-rhev-2.6.0-28.el7_3.8.x86_64
qemu-img-rhev-2.6.0-28.el7_3.8.x86_64
qemu-kvm-tools-rhev-2.6.0-28.el7_3.8.x86_64
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.8.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.5.x86_64

Comment 9 Avihai 2017-03-27 13:47:17 UTC
(In reply to Allon Mureinik from comment #5)
> (In reply to Avihai from comment #4)
> > (In reply to Allon Mureinik from comment #3)
> > > From a very superficial scrubbing, the root cause ("No space left on
> > > device") seems similar to the error in bug 1434927 and even in bug 1435813,
> > > which was encountered out in the field with RHV 4.0.6.
> > > 
> > > I wonder if we stumbled on a qemu-img issue.
> > > 
> > > Avihai - what qemu-img=rhev version are you using?
> > > Could you downgrade it and retry the flow?
> > 
> > Alon , 
> > current qemu image on hosts is :
> > qemu-img-rhev-2.6.0-28.el7_3.8.x86_64
> > 
> > Please add procedure to downgrade qemu-img & to what build .
> 
> Running "yum list qemu-img" should give you a list of available versions.
> You can then use "yum downgrade qemu-img-<version>" to downgrade it. Let's
> try for something older than qemu-img-rhev-2.6.0-27.el7.x86_64 that was
> reported in bug 1435813.

Alon , I tried to downgrade qemu but failed , this is what I did , please advise.

[root@storage-ge4-vdsm3 ~]# yum list qemu-img
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Available Packages
qemu-img.x86_64                                                                                             10:1.5.3-126.el7_3.5                                                                                              rhel-7.3-zstream

[root@storage-ge4-vdsm3 ~]#yum downgrade qemu-img-rhev-1.5.3-126.el7_3.5.x86_64
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
No package qemu-img-rhev-1.5.3-126.el7_3.5.x86_64 available.
Error: Nothing to do


[root@storage-ge4-vdsm3 ~]# yum downgrade qemu-img-10:1.5.3-126.el7_3.5
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
No package qemu-img-10:1.5.3-126.el7_3.5 available.
Error: Nothing to do

[root@storage-ge4-vdsm3 ~]# yum downgrade qemu-img-1.5.3-126.el7_3.5
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
No Match for available package: 10:qemu-img-1.5.3-126.el7_3.5.x86_64
Nothing to do

[root@storage-ge4-vdsm3 ~]# yum downgrade qemu-img.x86_64
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
No Match for available package: 10:qemu-img-1.5.3-126.el7.x86_64
Nothing to do

[root@storage-ge4-vdsm3 ~]# yum downgrade qemu-img-rhev.x86_64 
Display all 816 possibilities? (y or n)
[root@storage-ge4-vdsm3 ~]# yum downgrade qemu-img-rhev.x86_64 
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package qemu-img-rhev.x86_64 10:2.6.0-28.el7_3.6 will be a downgrade
---> Package qemu-img-rhev.x86_64 10:2.6.0-28.el7_3.8 will be erased
--> Finished Dependency Resolution
Error: Package: 10:qemu-kvm-rhev-2.6.0-28.el7_3.8.x86_64 (@rhevh-73)
           Requires: qemu-img-rhev = 10:2.6.0-28.el7_3.8
           Removing: 10:qemu-img-rhev-2.6.0-28.el7_3.8.x86_64 (@rhevh-73)
               qemu-img-rhev = 10:2.6.0-28.el7_3.8
           Downgraded By: 10:qemu-img-rhev-2.6.0-28.el7_3.6.x86_64 (openstack_deps_repo)
               qemu-img-rhev = 10:2.6.0-28.el7_3.6
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

Comment 10 Fred Rolland 2017-03-28 08:50:17 UTC
*** Bug 1431613 has been marked as a duplicate of this bug. ***

Comment 11 Red Hat Bugzilla Rules Engine 2017-04-02 12:41:19 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Fred Rolland 2017-04-03 14:39:57 UTC
Created attachment 1268406 [details]
strace of qemu convert with data on source LV

Comment 13 Fred Rolland 2017-04-03 14:40:38 UTC
Created attachment 1268407 [details]
strace of qemu convert with no data on source LV

Comment 14 Fred Rolland 2017-04-03 14:59:03 UTC
This issue will happen if there is data on the source LV on all the clusters.
If the LV is empty (zeros), the convert will not fail.

In some cases, the LV has 'old' data that has not being discarded in the storage backend, in that case even an empty disk will fail to convert to qcow because we create the destination too small.

In attached strace output file, we can see that the convert is reading and writing data. After zeroing the LV, the convert succeed.

The VDSM fix is needed to create the destination LV with the right size (including the QCOW overhead)


In the bellow use case, we created an empty raw disk of 1 GB and tried to convert to QCOW on a 1 GB disk.
It fails on the first run, as the QCOW format needs 10% more.
After zeroing the source disk, it succeed even on the 1 GB destination disk.

Here are the command we used :

strace -f -o convertstrace.txt qemu-img convert -t none -T none -f raw /dev/02a0ceec-0dd8-4c37-a3fc-73901d55f9a5/39677750-92eb-4f27-a9aa-bdeeea9ab1b4 -O qcow2 -o compat=1.1 /dev/02a0ceec-0dd8-4c37-a3fc-73901d55f9a5/c8787835-253c-43ed-bae7-4a8e172d9066

dd if=/dev/zero of=/dev/02a0ceec-0dd8-4c37-a3fc-73901d55f9a5/39677750-92eb-4f27-a9aa-bdeeea9ab1b4 bs=8M oflag=direct

strace -f -o convertstrace_empty.txt qemu-img convert -t none -T none -f raw /dev/02a0ceec-0dd8-4c37-a3fc-73901d55f9a5/39677750-92eb-4f27-a9aa-bdeeea9ab1b4 -O qcow2 -o compat=1.1 /dev/02a0ceec-0dd8-4c37-a3fc-73901d55f9a5/c8787835-253c-43ed-bae7-4a8e172d9066

Comment 15 Fred Rolland 2017-04-03 15:00:04 UTC
It is happening also on version 4.0 (test on QE team setup)

Comment 16 Red Hat Bugzilla Rules Engine 2017-04-03 15:11:16 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 17 Tal Nisan 2017-04-03 15:16:15 UTC
Based on Freddy and Nir's investigation this is not a regression

Comment 18 Avihai 2017-04-27 05:31:25 UTC
verified on build 4.1.2


Note You need to log in before you can comment on or make changes to this bug.