Bug 1363734

Summary: Failure to create a template with QCOW2 disks on iscsi domains (RHEL 7.3 LVM changed return code)
Product: [oVirt] ovirt-engine Reporter: Carlos Mestre González <cmestreg>
Component: BLL.StorageAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Carlos Mestre González <cmestreg>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.7CC: acanan, agk, alitke, amureini, bugs, cmestreg, gklein, gveitmic, hhan, nsoffer, pzhang, tnisan, yanyang, ylavi
Target Milestone: ovirt-4.0.4Keywords: Regression
Target Release: 4.0.4Flags: rule-engine: ovirt-4.0.z+
rule-engine: blocker+
ylavi: planning_ack+
amureini: devel_ack+
acanan: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: v4.18.12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1375079 (view as bug list) Environment:
Last Closed: 2016-09-26 12:33:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1375079    
Attachments:
Description Flags
engine.log and vdsm.log from spm
none
lvextend return code test (run as root) none

Description Carlos Mestre González 2016-08-03 12:53:09 UTC
Created attachment 1187068 [details]
engine.log and vdsm.log from spm

Description of problem:
Trying to create a template from a vm with two disks (preallocated and thin) with QCOW2 disks in iscsi fails

Version-Release number of selected component (if applicable):
rhevm-3.6.8.1-0.1.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with one preallocated disk and one thin disk
2. Click on Make a Template, select both disks as QCOW2

Actual results:
Operation fails with:
jsonrpc.Executor/4::ERROR::2016-08-03 15:39:12,707::task::866::Storage.TaskManager.Task::(_setError) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain: 'image=59c909fb-af26-4f7d-a12a-55f4cab34c8c, domain=f9ca6106-2b60-44c2-8da3-f4b75ba7a00d'
jsonrpc.Executor/4::DEBUG::2016-08-03 15:39:12,708::task::885::Storage.TaskManager.Task::(_run) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Task._run: 2a34a6f0-fabe-43cb-990b-3af058fcbcbe ('f9ca6106-2b60-44c2-8da3-f4b75ba7a00d', 'bf9ebc38-f0e3-49d0-90ea-815a04347a42', '59c909fb-af26-4f7d-a12a-55f4cab34c8c', 'false', 'false') {} failed - stopping task



Expected results:
Succeeds.

Additional info:
If I select the disks as RAW it works, also selecting QCOW2 disks on NFS domain.

Not sure if this is a vdsm error, assigning it to you guys for assessment

Comment 1 Allon Mureinik 2016-08-03 12:59:25 UTC
How is this done? via API? GUI?

Comment 2 Carlos Mestre González 2016-08-03 13:05:08 UTC
Done via GUI.

1. Create the Vm with name and blank template
2. via disk tab add two new disks, preallocated and thin.
3. click on Make Template and select for both disks QCOW2

Comment 3 Carlos Mestre González 2016-08-03 13:49:22 UTC
host in the evn are RHEL 7.3:

vdsm-4.17.33-1.el7ev.noarch
3.10.0-481.el7.x86_64

Comment 4 Adam Litke 2016-08-03 19:23:17 UTC
It looks like lvm changed the return code it uses when you attempt an lvextend with the same size.  We expect to receive rc=3 in this case but on this system we got rc=5.  So rc=5 is handled as an unexpected LVM error which aborts the copyImage operation.  

It is normal behavior for us to request an extension of the same size here:
                # To avoid 'prezeroing' preallocated volume on NFS domain,
                # we create the target volume with minimal size and after that
                # we'll change its metadata back to the original size.
                tmpSize = TEMPORARY_VOLUME_SIZE  # in sectors (10M)
                destDom.createVolume(
                    imgUUID=dstImgUUID, size=tmpSize, volFormat=dstVolFormat,
                    preallocate=volParams['prealloc'],
                    diskType=volParams['disktype'], volUUID=dstVolUUID,
                    desc=descr, srcImgUUID=volume.BLANK_UUID,
                    srcVolUUID=volume.BLANK_UUID)

                dstVol = sdCache.produce(dstSdUUID).produceVolume(
                    imgUUID=dstImgUUID, volUUID=dstVolUUID)
                # For convert to 'raw' we need use the virtual disk size
                # instead of apparent size
                if dstVolFormat == volume.RAW_FORMAT:
                    newsize = volParams['size']
                else:
                    newsize = volParams['apparentsize']
                dstVol.extend(newsize)

The underlying blockVolume._create rounds the given TEMPORARY_VOLUME_SIZE back up to 1G (see _calculate_volume_alloc_size) so if your disks are 1G then dstVol is already the correct size.

I don't think this will happen for disks larger than 1G.


When it works:

0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::300::Storage.Misc.excCmd::(cmd) FAILED: <err> = "  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  New size (8 extents) matches existing size (8 extents)\n  Run `lvextend --help' for more information.\n"; <rc> = 3
0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::1164::Storage.LVM::(_resizeLV) New size (in extents) matches existing size (in extents).

When it fails:

e52de423-1767-4a4f-9052-75b299bb6089::DEBUG::2016-08-03 15:34:23,118::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  WARNING: Not using lvmetad because config setting use_lvmetad=0.\n  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\n  New size (8 extents) matches existing size (8 extents).\n'; <rc> = 5
e52de423-1767-4a4f-9052-75b299bb6089::ERROR::2016-08-03 15:34:23,183::image::844::Storage.Image::(copyCollapsed) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 839, in copyCollapsed
    dstVol.extend(newsize)
  File "/usr/share/vdsm/storage/blockVolume.py", line 310, in extend
    lvm.extendLV(self.sdUUID, self.volUUID, sizemb)
  File "/usr/share/vdsm/storage/lvm.py", line 1179, in extendLV
    _resizeLV("lvextend", vgName, lvName, size)
  File "/usr/share/vdsm/storage/lvm.py", line 1175, in _resizeLV
    raise se.LogicalVolumeExtendError(vgName, lvName, "%sM" % (size, ))
LogicalVolumeExtendError: Logical Volume extend failed: 'vgname=661c8d75-642f-4677-b4f2-a1ce983049db lvname=e49b6738-3461-4511-90a7-4abb18820573 newsize=1024M'

Comment 5 Yaniv Kaul 2016-08-03 21:07:54 UTC
1. Do we know why LVM changed their RC? Isn't it part of a STABLE API?
Can we ask them?
(I really don't like relying on a message - that is something that is not a stable API).

2. I assume this is needed in both 3.6.x and 4.0.

Comment 6 Allon Mureinik 2016-08-04 10:19:26 UTC
Messages are definitely *not* a stable API (see, e.g., deb6f6a06c2fa25b1c9aa4c6eeb9529ce7c5c4c3).

Carlos, what lvm2 version are you using?
IIUC, this change was not deliberate, and commit e1112227d19cc252e1404eb4bf44d03e3867e463 (which is included in 2.0.2-161) should fix this issue.

Comment 7 Carlos Mestre González 2016-08-04 10:47:05 UTC
# rpm -qa | grep lvm2
lvm2-libs-2.02.162-1.el7.x86_64
lvm2-2.02.162-1.el7.x86_64

Comment 8 Carlos Mestre González 2016-08-04 10:50:31 UTC
(In reply to Yaniv Kaul from comment #5)
> 2. I assume this is needed in both 3.6.x and 4.0.

In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created successfully.

Comment 9 Carlos Mestre González 2016-08-04 10:51:36 UTC
putting back need info for Adam from comment 5.

Comment 10 Allon Mureinik 2016-08-04 10:56:41 UTC
(In reply to Carlos Mestre González from comment #8)
> (In reply to Yaniv Kaul from comment #5)
> > 2. I assume this is needed in both 3.6.x and 4.0.
> 
> In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created
> successfully.
With the same lvm2 packages?

Comment 11 Carlos Mestre González 2016-08-04 11:05:51 UTC
(In reply to Allon Mureinik from comment #10)
> (In reply to Carlos Mestre González from comment #8)
> > (In reply to Yaniv Kaul from comment #5)
> > > 2. I assume this is needed in both 3.6.x and 4.0.
> > 
> > In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created
> > successfully.
> With the same lvm2 packages?
Allon, you're right, i've got outdated lvm2 packages in my hosts:
# rpm -qa | grep lvm2
lvm2-libs-2.02.130-5.el7_2.5.x86_64
lvm2-2.02.130-5.el7_2.5.x86_64

I'll update my env just to confirm.

Comment 12 Adam Litke 2016-08-04 17:27:48 UTC
Created attachment 1187594 [details]
lvextend return code test (run as root)

This test case can be used to check the return code of the lvextend command when the current/same size is used.  We expect this to be 3.

Comment 13 Adam Litke 2016-08-04 17:34:17 UTC
The code in question (checking for rc=3) has been in vdsm for a long time.  Any host that is running a bad version of lvm2 will be affected (RHEL-7.3 seems to be the only affected supported platform so far but this could impact Fedora 24 until fixed).

I agree that the proper solution is to fix lvm2 if we still have time.

Comment 14 Adam Litke 2016-08-04 19:42:27 UTC
I installed RHEL-7.3 from RHEL-7.3-20160729.1-Server-x86_64-dvd1.iso which has lvm2-2.02.161-3.el7.x86_64.  The test script attached to this bug reports rc=5 (broken).

lvm2-2.02.162-1.el7.x86_64 has the same behavior.

Comment 15 Allon Mureinik 2016-08-08 08:11:21 UTC
*** Bug 1364339 has been marked as a duplicate of this bug. ***

Comment 16 Nir Soffer 2016-08-15 21:48:45 UTC
We don't depend on bug 1365186, fixing our error handling to work with any lvm
version.

Comment 18 Carlos Mestre González 2016-09-13 13:34:41 UTC
verified with:
lvm2-libs-2.02.165-1.el7.x86_64
lvm2-2.02.165-1.el7.x86_64
vdsm-4.18.13-1.el7ev.x86_64