Bug 1577060 - Live merge failed because HSM did not refresh the base volume after extension
Summary: Live merge failed because HSM did not refresh the base volume after extension
Keywords:
Status: CLOSED DUPLICATE of bug 1232481
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.4
: ---
Assignee: Ala Hino
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-11 06:39 UTC by Germano Veit Michel
Modified: 2021-06-10 16:10 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-16 04:10:34 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1488453 0 None None None 2018-05-12 23:33:27 UTC

Description Germano Veit Michel 2018-05-11 06:39:15 UTC
Description of problem:

A live merge failed because the engine did not tell the host running the vm to refresh the volume, even if it was extended fine in the SPM. It proceeded directly from Extend to Merge commands, no Refresh can be seen.

Initial Scenario:
1. VM had 200G disk (raw)
2. Snapshot created
3. Disk extended to 300G (leaf is qcow2)

Deleting the snapshot fails. The engine sends the ExtendImageSize command to increase the base size to 300G (which works fine), but it never sent the RefreshVolume command to the host running the VM. So even if the image was extended for the merge. The host running the VM still sees the old size (200G)

The merge fails:

jsonrpc.Executor/3::ERROR::2018-05-11 07:45:37,398::vm::4967::virt.vm::(merge) vmId=`84d8465f-df72-4f80-a45c-8bee9feb66e2`::Live merge failed (job: c5369f25-e17c-405d-b96c-661fa1f9d679)      
Traceback (most recent call last):                                                                                                                                                             
  File "/usr/share/vdsm/virt/vm.py", line 4963, in merge                                                                                                                                       
    flags)                                                                                                                                                                                     
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f                                                                                                                                      
    ret = attr(*args, **kwargs)                                                                                                                                                                
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper                                                                                                      
    ret = f(*args, **kwargs)                                                                                                                                                                   
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1313, in wrapper                                                                                                                 
    return func(inst, *args, **kwargs)                                                                                                                                                         
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 668, in blockCommit                                                                                                               
    if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)                                                                                                               
libvirtError: internal error: unable to execute QEMU command 'block-commit': Top image /rhev/data-center/9ede6309-174a-4dec-95ef-73ce342542b6/3c0e67db-ccee-4bdb-81a8-19d908e8f05d/images/67d2cfd4-9792-4edb-af9b-ac7210318996/ab8a38c4-729f-4074-a600-84b96ce6ca7a is larger than base image /rhev/data-center/9ede6309-174a-4dec-95ef-73ce342542b6/3c0e67db-ccee-4bdb-81a8-19d908e8f05d/images/67d2cfd4-9792-4edb-af9b-ac7210318996/d487966c-78f6-41fc-8a51-affb14f6f1c8, and resize of base image failed: Invalid argument

Version-Release number of selected component (if applicable):
rhevm-4.1.5.2-0.1.el7.noarch
vdsm-4.17.35-1.el7ev.noarch

How reproducible:
Unknown

NOTE: Data-Center/Cluster level is 3.6

Comment 3 Germano Veit Michel 2018-05-11 06:47:19 UTC
2018-05-09 07:19:46,592+12 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (pool-6-thread-2) [7eb16c83-2493-40a4-b158-694cdbc860e2] Running command: RemoveSnapshotSingleDiskLiveCommand internal: true. Entities affected :  ID: 3c0e67db-ccee-4bdb-81a8-19d908e8f05d Type: Storage 

Extend (SPM)
2018-05-09 07:19:46,781+12 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ExtendImageSizeVDSCommand] (pool-6-thread-4) [7eb16c83-2493-40a4-b158-694cdbc860e2] START, ExtendImageSizeVDSCommand( ExtendImageSizeVDSCommandParameters:{runAsync='true', storagePoolId='9ede6309-174a-4dec-95ef-73ce342542b6', ignoreFailoverLimit='false'}), log id: 720acf8e

Refresh (HSM)
???

Merge (HSM)

2018-05-09 07:19:55,087+12 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-6-thread-5) [7eb16c83-2493-40a4-b158-694cdbc860e2] START, MergeVDSCommand(HostName = pxlswh01, MergeVDSCommandParameters:{runAsync='true', hostId='b76d1a36-3d4b-4fee-96dc-1f370acf0888', vmId='84d8465f-df72-4f80-a45c-8bee9feb66e2', storagePoolId='9ede6309-174a-4dec-95ef-73ce342542b6', storageDomainId='3c0e67db-ccee-4bdb-81a8-19d908e8f05d', imageGroupId='67d2cfd4-9792-4edb-af9b-ac7210318996', imageId='ab8a38c4-729f-4074-a600-84b96ce6ca7a', baseImageId='d487966c-78f6-41fc-8a51-affb14f6f1c8', topImageId='ab8a38c4-729f-4074-a600-84b96ce6ca7a', bandwidth='0'}), log id: 266af902

Comment 5 Ala Hino 2018-05-15 10:33:08 UTC
Ah, this is unfortunate ...
This is a duplicate of bug 1232481 that was fixed in Vdsm 4.17.36 (version reported in the bug description is 4.17.35).

There is a confusion between bug 1232481 and bug 1367281.
However, the path fixing this issue in 3.6 is https://gerrit.ovirt.org/#/c/63634/ that is included in the following branches/tags:

Branches: ovirt-3.6, ovirt-3.6-async

Tags: v4.17.36, v4.17.37, v4.17.38, v4.17.39, v4.17.40, v4.17.41, v4.17.42, v4.17.43, v4.17.43.1, v4.17.44


Germano,

Can you please confirm this?

Comment 6 Germano Veit Michel 2018-05-15 23:01:51 UTC
(In reply to Ala Hino from comment #5)
> Ah, this is unfortunate ...
> This is a duplicate of bug 1232481 that was fixed in Vdsm 4.17.36 (version
> reported in the bug description is 4.17.35).
> 
> There is a confusion between bug 1232481 and bug 1367281.
> However, the path fixing this issue in 3.6 is
> https://gerrit.ovirt.org/#/c/63634/ that is included in the following
> branches/tags:
> 
> Branches: ovirt-3.6, ovirt-3.6-async
> 
> Tags: v4.17.36, v4.17.37, v4.17.38, v4.17.39, v4.17.40, v4.17.41, v4.17.42,
> v4.17.43, v4.17.43.1, v4.17.44
> 
> 
> Germano,
> 
> Can you please confirm this?

Hi Ala,

Ahhh. So it's not the engine that sends that RefreshVolume command to the host running the vm on this case? 

Because I looked at this: https://gerrit.ovirt.org/#/c/47671/ and it does look like the engine should send this the refresh command after an extension during live merge command, which was not sent in this case.

So a 4.1 engine with a 3.6 vdsm the command is not sent and the user needs the patched vdsm. Is this correct?

Comment 7 Ala Hino 2018-05-16 03:59:29 UTC
Hi Germano,

Base volume refresh is done at vdsm side.
The refresh in that patch is done **not** after extending the base volume, but rather it is done if the base volume is ILLEGAL. Not sure what this patch tried to target.

The user does need to upgrade vdsm to get the fix.

Comment 8 Germano Veit Michel 2018-05-16 04:10:34 UTC
Hi Ala,

Thanks for clarifying and sorry for the confusion.

*** This bug has been marked as a duplicate of bug 1232481 ***

Comment 9 Elad 2018-08-02 11:15:09 UTC
This bug is DUP of Bug 1232481 which has qe_test_coverage+

Comment 10 Franta Kust 2019-05-16 13:03:43 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.