Created attachment 1187068 [details] engine.log and vdsm.log from spm Description of problem: Trying to create a template from a vm with two disks (preallocated and thin) with QCOW2 disks in iscsi fails Version-Release number of selected component (if applicable): rhevm-3.6.8.1-0.1.el6.noarch How reproducible: 100% Steps to Reproduce: 1. Create a vm with one preallocated disk and one thin disk 2. Click on Make a Template, select both disks as QCOW2 Actual results: Operation fails with: jsonrpc.Executor/4::ERROR::2016-08-03 15:39:12,707::task::866::Storage.TaskManager.Task::(_setError) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1519, in deleteImage raise se.ImageDoesNotExistInSD(imgUUID, sdUUID) ImageDoesNotExistInSD: Image does not exist in domain: 'image=59c909fb-af26-4f7d-a12a-55f4cab34c8c, domain=f9ca6106-2b60-44c2-8da3-f4b75ba7a00d' jsonrpc.Executor/4::DEBUG::2016-08-03 15:39:12,708::task::885::Storage.TaskManager.Task::(_run) Task=`2a34a6f0-fabe-43cb-990b-3af058fcbcbe`::Task._run: 2a34a6f0-fabe-43cb-990b-3af058fcbcbe ('f9ca6106-2b60-44c2-8da3-f4b75ba7a00d', 'bf9ebc38-f0e3-49d0-90ea-815a04347a42', '59c909fb-af26-4f7d-a12a-55f4cab34c8c', 'false', 'false') {} failed - stopping task Expected results: Succeeds. Additional info: If I select the disks as RAW it works, also selecting QCOW2 disks on NFS domain. Not sure if this is a vdsm error, assigning it to you guys for assessment
How is this done? via API? GUI?
Done via GUI. 1. Create the Vm with name and blank template 2. via disk tab add two new disks, preallocated and thin. 3. click on Make Template and select for both disks QCOW2
host in the evn are RHEL 7.3: vdsm-4.17.33-1.el7ev.noarch 3.10.0-481.el7.x86_64
It looks like lvm changed the return code it uses when you attempt an lvextend with the same size. We expect to receive rc=3 in this case but on this system we got rc=5. So rc=5 is handled as an unexpected LVM error which aborts the copyImage operation. It is normal behavior for us to request an extension of the same size here: # To avoid 'prezeroing' preallocated volume on NFS domain, # we create the target volume with minimal size and after that # we'll change its metadata back to the original size. tmpSize = TEMPORARY_VOLUME_SIZE # in sectors (10M) destDom.createVolume( imgUUID=dstImgUUID, size=tmpSize, volFormat=dstVolFormat, preallocate=volParams['prealloc'], diskType=volParams['disktype'], volUUID=dstVolUUID, desc=descr, srcImgUUID=volume.BLANK_UUID, srcVolUUID=volume.BLANK_UUID) dstVol = sdCache.produce(dstSdUUID).produceVolume( imgUUID=dstImgUUID, volUUID=dstVolUUID) # For convert to 'raw' we need use the virtual disk size # instead of apparent size if dstVolFormat == volume.RAW_FORMAT: newsize = volParams['size'] else: newsize = volParams['apparentsize'] dstVol.extend(newsize) The underlying blockVolume._create rounds the given TEMPORARY_VOLUME_SIZE back up to 1G (see _calculate_volume_alloc_size) so if your disks are 1G then dstVol is already the correct size. I don't think this will happen for disks larger than 1G. When it works: 0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::300::Storage.Misc.excCmd::(cmd) FAILED: <err> = " WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n New size (8 extents) matches existing size (8 extents)\n Run `lvextend --help' for more information.\n"; <rc> = 3 0c3ae7dd-6ae5-483f-a22e-8230c60d414f::DEBUG::2016-08-03 18:36:35,219::lvm::1164::Storage.LVM::(_resizeLV) New size (in extents) matches existing size (in extents). When it fails: e52de423-1767-4a4f-9052-75b299bb6089::DEBUG::2016-08-03 15:34:23,118::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' WARNING: Not using lvmetad because config setting use_lvmetad=0.\n WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\n New size (8 extents) matches existing size (8 extents).\n'; <rc> = 5 e52de423-1767-4a4f-9052-75b299bb6089::ERROR::2016-08-03 15:34:23,183::image::844::Storage.Image::(copyCollapsed) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 839, in copyCollapsed dstVol.extend(newsize) File "/usr/share/vdsm/storage/blockVolume.py", line 310, in extend lvm.extendLV(self.sdUUID, self.volUUID, sizemb) File "/usr/share/vdsm/storage/lvm.py", line 1179, in extendLV _resizeLV("lvextend", vgName, lvName, size) File "/usr/share/vdsm/storage/lvm.py", line 1175, in _resizeLV raise se.LogicalVolumeExtendError(vgName, lvName, "%sM" % (size, )) LogicalVolumeExtendError: Logical Volume extend failed: 'vgname=661c8d75-642f-4677-b4f2-a1ce983049db lvname=e49b6738-3461-4511-90a7-4abb18820573 newsize=1024M'
1. Do we know why LVM changed their RC? Isn't it part of a STABLE API? Can we ask them? (I really don't like relying on a message - that is something that is not a stable API). 2. I assume this is needed in both 3.6.x and 4.0.
Messages are definitely *not* a stable API (see, e.g., deb6f6a06c2fa25b1c9aa4c6eeb9529ce7c5c4c3). Carlos, what lvm2 version are you using? IIUC, this change was not deliberate, and commit e1112227d19cc252e1404eb4bf44d03e3867e463 (which is included in 2.0.2-161) should fix this issue.
# rpm -qa | grep lvm2 lvm2-libs-2.02.162-1.el7.x86_64 lvm2-2.02.162-1.el7.x86_64
(In reply to Yaniv Kaul from comment #5) > 2. I assume this is needed in both 3.6.x and 4.0. In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created successfully.
putting back need info for Adam from comment 5.
(In reply to Carlos Mestre González from comment #8) > (In reply to Yaniv Kaul from comment #5) > > 2. I assume this is needed in both 3.6.x and 4.0. > > In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created > successfully. With the same lvm2 packages?
(In reply to Allon Mureinik from comment #10) > (In reply to Carlos Mestre González from comment #8) > > (In reply to Yaniv Kaul from comment #5) > > > 2. I assume this is needed in both 3.6.x and 4.0. > > > > In 4.0 (rhevm-4.0.2-0.1.rc.el7ev.noarch) the template is created > > successfully. > With the same lvm2 packages? Allon, you're right, i've got outdated lvm2 packages in my hosts: # rpm -qa | grep lvm2 lvm2-libs-2.02.130-5.el7_2.5.x86_64 lvm2-2.02.130-5.el7_2.5.x86_64 I'll update my env just to confirm.
Created attachment 1187594 [details] lvextend return code test (run as root) This test case can be used to check the return code of the lvextend command when the current/same size is used. We expect this to be 3.
The code in question (checking for rc=3) has been in vdsm for a long time. Any host that is running a bad version of lvm2 will be affected (RHEL-7.3 seems to be the only affected supported platform so far but this could impact Fedora 24 until fixed). I agree that the proper solution is to fix lvm2 if we still have time.
I installed RHEL-7.3 from RHEL-7.3-20160729.1-Server-x86_64-dvd1.iso which has lvm2-2.02.161-3.el7.x86_64. The test script attached to this bug reports rc=5 (broken). lvm2-2.02.162-1.el7.x86_64 has the same behavior.
*** Bug 1364339 has been marked as a duplicate of this bug. ***
We don't depend on bug 1365186, fixing our error handling to work with any lvm version.
verified with: lvm2-libs-2.02.165-1.el7.x86_64 lvm2-2.02.165-1.el7.x86_64 vdsm-4.18.13-1.el7ev.x86_64