Description of problem: A live merge failed because the engine did not tell the host running the vm to refresh the volume, even if it was extended fine in the SPM. It proceeded directly from Extend to Merge commands, no Refresh can be seen. Initial Scenario: 1. VM had 200G disk (raw) 2. Snapshot created 3. Disk extended to 300G (leaf is qcow2) Deleting the snapshot fails. The engine sends the ExtendImageSize command to increase the base size to 300G (which works fine), but it never sent the RefreshVolume command to the host running the VM. So even if the image was extended for the merge. The host running the VM still sees the old size (200G) The merge fails: jsonrpc.Executor/3::ERROR::2018-05-11 07:45:37,398::vm::4967::virt.vm::(merge) vmId=`84d8465f-df72-4f80-a45c-8bee9feb66e2`::Live merge failed (job: c5369f25-e17c-405d-b96c-661fa1f9d679) Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 4963, in merge flags) File "/usr/share/vdsm/virt/virdomain.py", line 68, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 668, in blockCommit if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self) libvirtError: internal error: unable to execute QEMU command 'block-commit': Top image /rhev/data-center/9ede6309-174a-4dec-95ef-73ce342542b6/3c0e67db-ccee-4bdb-81a8-19d908e8f05d/images/67d2cfd4-9792-4edb-af9b-ac7210318996/ab8a38c4-729f-4074-a600-84b96ce6ca7a is larger than base image /rhev/data-center/9ede6309-174a-4dec-95ef-73ce342542b6/3c0e67db-ccee-4bdb-81a8-19d908e8f05d/images/67d2cfd4-9792-4edb-af9b-ac7210318996/d487966c-78f6-41fc-8a51-affb14f6f1c8, and resize of base image failed: Invalid argument Version-Release number of selected component (if applicable): rhevm-4.1.5.2-0.1.el7.noarch vdsm-4.17.35-1.el7ev.noarch How reproducible: Unknown NOTE: Data-Center/Cluster level is 3.6
2018-05-09 07:19:46,592+12 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (pool-6-thread-2) [7eb16c83-2493-40a4-b158-694cdbc860e2] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected : ID: 3c0e67db-ccee-4bdb-81a8-19d908e8f05d Type: Storage Extend (SPM) 2018-05-09 07:19:46,781+12 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendImageSizeVDSCommand] (pool-6-thread-4) [7eb16c83-2493-40a4-b158-694cdbc860e2] START, ExtendImageSizeVDSCommand( ExtendImageSizeVDSCommandParameters:{runAsync='true', storagePoolId='9ede6309-174a-4dec-95ef-73ce342542b6', ignoreFailoverLimit='false'}), log id: 720acf8e Refresh (HSM) ??? Merge (HSM) 2018-05-09 07:19:55,087+12 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-6-thread-5) [7eb16c83-2493-40a4-b158-694cdbc860e2] START, MergeVDSCommand(HostName = pxlswh01, MergeVDSCommandParameters:{runAsync='true', hostId='b76d1a36-3d4b-4fee-96dc-1f370acf0888', vmId='84d8465f-df72-4f80-a45c-8bee9feb66e2', storagePoolId='9ede6309-174a-4dec-95ef-73ce342542b6', storageDomainId='3c0e67db-ccee-4bdb-81a8-19d908e8f05d', imageGroupId='67d2cfd4-9792-4edb-af9b-ac7210318996', imageId='ab8a38c4-729f-4074-a600-84b96ce6ca7a', baseImageId='d487966c-78f6-41fc-8a51-affb14f6f1c8', topImageId='ab8a38c4-729f-4074-a600-84b96ce6ca7a', bandwidth='0'}), log id: 266af902
Ah, this is unfortunate ... This is a duplicate of bug 1232481 that was fixed in Vdsm 4.17.36 (version reported in the bug description is 4.17.35). There is a confusion between bug 1232481 and bug 1367281. However, the path fixing this issue in 3.6 is https://gerrit.ovirt.org/#/c/63634/ that is included in the following branches/tags: Branches: ovirt-3.6, ovirt-3.6-async Tags: v4.17.36, v4.17.37, v4.17.38, v4.17.39, v4.17.40, v4.17.41, v4.17.42, v4.17.43, v4.17.43.1, v4.17.44 Germano, Can you please confirm this?
(In reply to Ala Hino from comment #5) > Ah, this is unfortunate ... > This is a duplicate of bug 1232481 that was fixed in Vdsm 4.17.36 (version > reported in the bug description is 4.17.35). > > There is a confusion between bug 1232481 and bug 1367281. > However, the path fixing this issue in 3.6 is > https://gerrit.ovirt.org/#/c/63634/ that is included in the following > branches/tags: > > Branches: ovirt-3.6, ovirt-3.6-async > > Tags: v4.17.36, v4.17.37, v4.17.38, v4.17.39, v4.17.40, v4.17.41, v4.17.42, > v4.17.43, v4.17.43.1, v4.17.44 > > > Germano, > > Can you please confirm this? Hi Ala, Ahhh. So it's not the engine that sends that RefreshVolume command to the host running the vm on this case? Because I looked at this: https://gerrit.ovirt.org/#/c/47671/ and it does look like the engine should send this the refresh command after an extension during live merge command, which was not sent in this case. So a 4.1 engine with a 3.6 vdsm the command is not sent and the user needs the patched vdsm. Is this correct?
Hi Germano, Base volume refresh is done at vdsm side. The refresh in that patch is done **not** after extending the base volume, but rather it is done if the base volume is ILLEGAL. Not sure what this patch tried to target. The user does need to upgrade vdsm to get the fix.
Hi Ala, Thanks for clarifying and sorry for the confusion. *** This bug has been marked as a duplicate of bug 1232481 ***
This bug is DUP of Bug 1232481 which has qe_test_coverage+
BZ<2>Jira Resync