Bug 1363734 - Failure to create a template with QCOW2 disks on iscsi domains (RHEL 7.3 LVM changed return code)
Summary: Failure to create a template with QCOW2 disks on iscsi domains (RHEL 7.3 LVM ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-4.0.4
: 4.0.4
Assignee: Nir Soffer
QA Contact: Carlos Mestre González
URL:
Whiteboard: storage
Depends On:
Blocks: 1375079
TreeView+ depends on / blocked
 
Reported: 2016-08-03 12:53 UTC by Carlos Mestre González
Modified: 2017-02-17 02:46 UTC (History)
14 users (show)

Fixed In Version: v4.18.12
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1375079 (view as bug list)
Environment:
Last Closed: 2016-09-26 12:33:46 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: blocker+
ylavi: planning_ack+
amureini: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)
engine.log and vdsm.log from spm (651.15 KB, application/x-gzip)
2016-08-03 12:53 UTC, Carlos Mestre González
no flags Details
lvextend return code test (run as root) (576 bytes, text/plain)
2016-08-04 17:27 UTC, Adam Litke
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2933491 0 None None None 2017-02-17 02:46:11 UTC
oVirt gerrit 61917 0 master ABANDONED lvm: Do not rely on return code when handling extendLV errors 2016-08-29 12:22:18 UTC
oVirt gerrit 62338 0 master MERGED lvm: Fix error handling when resizing lvs 2016-08-23 22:02:47 UTC
oVirt gerrit 62734 0 ovirt-3.6 MERGED lvm: Fix error handling when resizing lvs 2016-08-29 12:59:34 UTC
oVirt gerrit 62735 0 ovirt-3.6 MERGED lvm: Separate lv reduce and extend 2016-08-29 12:59:53 UTC
oVirt gerrit 62739 0 ovirt-4.0 MERGED lvm: Fix error handling when resizing lvs 2016-08-29 12:07:13 UTC
oVirt gerrit 62740 0 ovirt-4.0 MERGED lvm: Separate lv reduce and extend 2016-08-29 12:07:44 UTC

Description Carlos Mestre González 2016-08-03 12:53:09 UTC
Created attachment 1187068 [details]
engine.log and vdsm.log from spm

Description of problem:
Trying to create a template from a vm with two disks (preallocated and thin) with QCOW2 disks in iscsi fails

Version-Release number of selected component (if applicable):
rhevm-3.6.8.1-0.1.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with one preallocated disk and one thin disk
2. Click on Make a Template, select both disks as QCOW2

Actual results:
Operation fails with:
jsonrpc.Executor/4::ERROR::2016-08-03 15:39:12,707::task::866::Storage.TaskManager.Task::(_setError) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain: 'image=59c909fb-af26-4f7d-a12a-55f4cab34c8c, domain=f9ca6106-2b60-44c2-8da3-f4b75ba7a00d'
jsonrpc.Executor/4::DEBUG::2016-08-03 15:39:12,708::task::885::Storage.TaskManager.Task::(_run) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Task._run: 2a34a6f0-fabe-43cb-990b-3af058fcbcbe ('f9ca6106-2b60-44c2-8da3-f4b75ba7a00d', 'bf9ebc38-f0e3-49d0-90ea-815a04347a42', '59c909fb-af26-4f7d-a12a-55f4cab34c8c', 'false', 'false') {} failed - stopping task



Expected results:
Succeeds.

Additional info:
If I select the disks as RAW it works, also selecting QCOW2 disks on NFS domain.

Not sure if this is a vdsm error, assigning it to you guys for assessment

Comment 1 Allon Mureinik 2016-08-03 12:59:25 UTC
How is this done? via API? GUI?

Comment 2 Carlos Mestre González 2016-08-03 13:05:08 UTC
Done via GUI.

1. Create the Vm with name and blank template
2. via disk tab add two new disks, preallocated and thin.
3. click on Make Template and select for both disks QCOW2

Comment 3 Carlos Mestre González 2016-08-03 13:49:22 UTC
host in the evn are RHEL 7.3:

vdsm-4.17.33-1.el7ev.noarch
3.10.0-481.el7.x86_64

Comment 4 Adam Litke 2016-08-03 19:23:17 UTC
It looks like lvm changed the return code it uses when you attempt an lvextend with the same size.  We expect to receive rc=3 in this case but on this system we got rc=5.  So rc=5 is handled as an unexpected LVM error which aborts the copyImage operation.  

It is normal behavior for us to request an extension of the same size here:
                # To avoid 'prezeroing' preallocated volume on NFS domain,
                # we create the target volume with minimal size and after that
                # we'll change its metadata back to the original size.
                tmpSize = TEMPORARY_VOLUME_SIZE  # in sectors (10M)
                destDom.createVolume(
                    imgUUID=dstImgUUID, size=tmpSize, volFormat=dstVolFormat,
                    preallocate=volParams['prealloc'],
                    diskType=volParams['disktype'], volUUID=dstVolUUID,
                    desc=descr, srcImgUUID=volume.BLANK_UUID,
                    srcVolUUID=volume.BLANK_UUID)

                dstVol = sdCache.produce(dstSdUUID).produceVolume(
                    imgUUID=dstImgUUID, volUUID=dstVolUUID)
                # For convert to 'raw' we need use the virtual disk size
                # instead of apparent size
                if dstVolFormat == volume.RAW_FORMAT:
                    newsize = volParams['size']
                else:
                    newsize = volParams['apparentsize']
                dstVol.extend(newsize)

The underlying blockVolume._create rounds the given TEMPORARY_VOLUME_SIZE back up to 1G (see _calculate_volume_alloc_size) so if your disks are 1G then dstVol is already the correct size.

I don't think this will happen for disks larger than 1G.


When it works:

0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::300::Storage.Misc.excCmd::(cmd) FAILED: <err> = "  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  New size (8 extents) matches existing size (8 extents)\n  Run `lvextend --help' for more information.\n"; <rc> = 3
0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::1164::Storage.LVM::(_resizeLV) New size (in extents) matches existing size (in extents).

When it fails:

e52de423-1767-4a4f-9052-75b299bb6089::DEBUG::2016-08-03 15:34:23,118::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  WARNING: Not using lvmetad because config setting use_lvmetad=0.\n  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\n  New size (8 extents) matches existing size (8 extents).\n'; <rc> = 5
e52de423-1767-4a4f-9052-75b299bb6089::ERROR::2016-08-03 15:34:23,183::image::844::Storage.Image::(copyCollapsed) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 839, in copyCollapsed
    dstVol.extend(newsize)
  File "/usr/share/vdsm/storage/blockVolume.py", line 310, in extend
    lvm.extendLV(self.sdUUID, self.volUUID, sizemb)
  File "/usr/share/vdsm/storage/lvm.py", line 1179, in extendLV
    _resizeLV("lvextend", vgName, lvName, size)
  File "/usr/share/vdsm/storage/lvm.py", line 1175, in _resizeLV
    raise se.LogicalVolumeExtendError(vgName, lvName, "%sM" % (size, ))
LogicalVolumeExtendError: Logical Volume extend failed: 'vgname=661c8d75-642f-4677-b4f2-a1ce983049db lvname=e49b6738-3461-4511-90a7-4abb18820573 newsize=1024M'

Comment 5 Yaniv Kaul 2016-08-03 21:07:54 UTC
1. Do we know why LVM changed their RC? Isn't it part of a STABLE API?
Can we ask them?
(I really don't like relying on a message - that is something that is not a stable API).

2. I assume this is needed in both 3.6.x and 4.0.

Comment 6 Allon Mureinik 2016-08-04 10:19:26 UTC
Messages are definitely *not* a stable API (see, e.g., deb6f6a06c2fa25b1c9aa4c6eeb9529ce7c5c4c3).

Carlos, what lvm2 version are you using?
IIUC, this change was not deliberate, and commit e1112227d19cc252e1404eb4bf44d03e3867e463 (which is included in 2.0.2-161) should fix this issue.

Comment 7 Carlos Mestre González 2016-08-04 10:47:05 UTC
# rpm -qa | grep lvm2
lvm2-libs-2.02.162-1.el7.x86_64
lvm2-2.02.162-1.el7.x86_64

Comment 8 Carlos Mestre González 2016-08-04 10:50:31 UTC
(In reply to Yaniv Kaul from comment #5)
> 2. I assume this is needed in both 3.6.x and 4.0.

In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created successfully.

Comment 9 Carlos Mestre González 2016-08-04 10:51:36 UTC
putting back need info for Adam from comment 5.

Comment 10 Allon Mureinik 2016-08-04 10:56:41 UTC
(In reply to Carlos Mestre González from comment #8)
> (In reply to Yaniv Kaul from comment #5)
> > 2. I assume this is needed in both 3.6.x and 4.0.
> 
> In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created
> successfully.
With the same lvm2 packages?

Comment 11 Carlos Mestre González 2016-08-04 11:05:51 UTC
(In reply to Allon Mureinik from comment #10)
> (In reply to Carlos Mestre González from comment #8)
> > (In reply to Yaniv Kaul from comment #5)
> > > 2. I assume this is needed in both 3.6.x and 4.0.
> > 
> > In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created
> > successfully.
> With the same lvm2 packages?
Allon, you're right, i've got outdated lvm2 packages in my hosts:
# rpm -qa | grep lvm2
lvm2-libs-2.02.130-5.el7_2.5.x86_64
lvm2-2.02.130-5.el7_2.5.x86_64

I'll update my env just to confirm.

Comment 12 Adam Litke 2016-08-04 17:27:48 UTC
Created attachment 1187594 [details]
lvextend return code test (run as root)

This test case can be used to check the return code of the lvextend command when the current/same size is used.  We expect this to be 3.

Comment 13 Adam Litke 2016-08-04 17:34:17 UTC
The code in question (checking for rc=3) has been in vdsm for a long time.  Any host that is running a bad version of lvm2 will be affected (RHEL-7.3 seems to be the only affected supported platform so far but this could impact Fedora 24 until fixed).

I agree that the proper solution is to fix lvm2 if we still have time.

Comment 14 Adam Litke 2016-08-04 19:42:27 UTC
I installed RHEL-7.3 from RHEL-7.3-20160729.1-Server-x86_64-dvd1.iso which has lvm2-2.02.161-3.el7.x86_64.  The test script attached to this bug reports rc=5 (broken).

lvm2-2.02.162-1.el7.x86_64 has the same behavior.

Comment 15 Allon Mureinik 2016-08-08 08:11:21 UTC
*** Bug 1364339 has been marked as a duplicate of this bug. ***

Comment 16 Nir Soffer 2016-08-15 21:48:45 UTC
We don't depend on bug 1365186, fixing our error handling to work with any lvm
version.

Comment 18 Carlos Mestre González 2016-09-13 13:34:41 UTC
verified with:
lvm2-libs-2.02.165-1.el7.x86_64
lvm2-2.02.165-1.el7.x86_64
vdsm-4.18.13-1.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.