Bug 1802277 - Live merge succeeds on vdsm side, but volume cannot be deleted (VolumeImageHasChildren: Cannot delete volume which has children)
Summary: Live merge succeeds on vdsm side, but volume cannot be deleted (VolumeImageHa...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.3.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.8-2
: ---
Assignee: Vojtech Juranek
QA Contact: Evelina Shames
URL:
Whiteboard:
: 1814318 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-12 19:04 UTC by Vojtech Juranek
Modified: 2021-09-24 09:45 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-25 07:32:16 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (301.46 KB, application/x-xz)
2020-02-12 19:04 UTC, Vojtech Juranek
no flags Details
logs (5.14 MB, application/zip)
2020-09-24 05:49 UTC, Evelina Shames
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4831111 0 None None None 2020-02-17 00:03:23 UTC
oVirt gerrit 106914 0 master MERGED libvirtconnection: Ensure stable block job events 2021-02-16 10:00:20 UTC
oVirt gerrit 106915 0 master MERGED vm: Store drive name in block job dict 2021-02-16 10:00:20 UTC
oVirt gerrit 106916 0 master MERGED vm: Log job UUID for tracked block job events 2021-02-16 10:00:20 UTC

Description Vojtech Juranek 2020-02-12 19:04:40 UTC
Created attachment 1662758 [details]
vdsm log

Description of problem:
When running in the cycle creating a snapshots and deleting (live merge) old ones, it can fail with exception bellow. After that disk is in illegal state and cannot be deleted.


Version-Release number of selected component (if applicable):
ovirt-engine-4.3.8.2-1.el7.noarch
vdsm-4.30.40-3.git7d8f244.el7.x86_64

How reproducible:
Randomly (hit it only once)

Steps to Reproduce:
1. create initial snapshot 
2. run in the loop:
2.1  create a snapshot
2.2  delete (live merge) original older snapshot 

Hit by running following script:
https://github.com/vjuranek/ovirt-bug-reproducers/blob/master/BZ1796415/snapshots.py

Actual results:
older snaphost we be merged/deleted

Expected results:
merge of the snapshot failed and disk in in illegal state

Additional info:

engine log exception:

2020-02-07 13:19:44,329-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] Failed in 'MergeVDS' method
2020-02-07 13:19:44,362-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM centos-7.7 command MergeVDS failed: Merge failed
2020-02-07 13:19:44,362-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand' return value 'StatusOnlyReturn [status=Status [code=52, message=Merge failed]]'
2020-02-07 13:19:44,362-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] HostName = centos-7.7
2020-02-07 13:19:44,362-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] Command 'MergeVDSCommand(HostName = centos-7.7, MergeVDSCommandParameters:{hostId='b74ec1cb-d9f7-4e04-b673-06a35ebc7ad2', vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9', storagePoolId='0b600aa6-42a1-11ea-b6bf-525400f551af', storageDomainId='75bcadd7-d61f-4c53-ad0e-259460d8b10d', imageGroupId='5d407f5f-a758-4b60-9633-c41285ac024c', imageId='a9d83317-fefa-4182-9f33-a443628d773f', baseImageId='5346ebcd-80c6-4696-b376-1fd16cff6d34', topImageId='18c43c63-6610-4aca-80d7-d5139f536e68', bandwidth='0'})' execution failed: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52
2020-02-07 13:19:44,362-05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] FINISH, MergeVDSCommand, return: , log id: 52195393
2020-02-07 13:19:44,362-05 ERROR [org.ovirt.engine.core.bll.MergeCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [374307c4-f4dd-42fc-9a4e-32a43f656c57] Engine exception thrown while sending merge command: org.ovirt.engine.core.common.errors.EngineException: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MergeVDS, error = Merge failed, code = 52 (Failed with error mergeErr and code 52)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:118) [bll.jar:]


vdsm error:

2020-02-07 13:19:48,975-0500 ERROR (tasks/5) [storage.TaskManager.Task] (Task='1108934c-f381-4128-91d8-b08ce8d6759f') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1967, in deleteVolume
    vol.delete(postZero=postZero, force=force, discard=discard)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 585, in delete
    self.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1292, in validateDelete
    self._manifest.validateDelete()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 545, in validateDelete
    raise se.VolumeImageHasChildren(self)
VolumeImageHasChildren: Cannot delete volume which has children (non-ethical): ['sdUUID: 75bcadd7-d61f-4c53-ad0e-259460d8b10d', u'imgUUID: 5d407f5f-a758-4b60-9633-c41285ac024c', u'volUUID: 18c43c63-6610-4aca-80d7-d5139f536e68']


This is caused by wrong image chain stored in engine DB as well as in volume metadata. In both cases the chain should be 
5346ebcd-80c6-4696-b376-1fd16cff6d34 -> 18c43c63-6610-4aca-80d7-d5139f536e68 -> a9d83317-fefa-4182-9f33-a443628d773f

Engine:

engine=# select image_guid::text, size, parentid, volume_type, active from images where image_guid in ('5346ebcd-80c6-4696-b376-1fd16cff6d34', '18c43c63-6610-4aca-80d7-d5139f536e68', 'a9d83317-fefa-4182-9f33-a443628d773f');
              image_guid              |    size    |               parentid               | volume_type | active 
--------------------------------------+------------+--------------------------------------+-------------+--------
 a9d83317-fefa-4182-9f33-a443628d773f | 1073741824 | 18c43c63-6610-4aca-80d7-d5139f536e68 |           2 | t
 18c43c63-6610-4aca-80d7-d5139f536e68 | 1073741824 | 5346ebcd-80c6-4696-b376-1fd16cff6d34 |           2 | f
 5346ebcd-80c6-4696-b376-1fd16cff6d34 | 1073741824 | 00000000-0000-0000-0000-000000000000 |           2 | f


vdsm:

[root@localhost ~]# lvs -o tags 75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-a443628d773f
  LV Tags                                                                             
  IU_5d407f5f-a758-4b60-9633-c41285ac024c,MD_1,PU_18c43c63-6610-4aca-80d7-d5139f536e68

[root@localhost ~]# lvs -o tags 75bcadd7-d61f-4c53-ad0e-259460d8b10d/18c43c63-6610-4aca-80d7-d5139f536e68
  LV Tags                                                                             
  IU_5d407f5f-a758-4b60-9633-c41285ac024c,MD_5,PU_5346ebcd-80c6-4696-b376-1fd16cff6d34

[root@localhost ~]# lvs -o tags 75bcadd7-d61f-4c53-ad0e-259460d8b10d/5346ebcd-80c6-4696-b376-1fd16cff6d34
  LV Tags                                                                             
  IU_5d407f5f-a758-4b60-9633-c41285ac024c,MD_8,PU_00000000-0000-0000-0000-000000000000

In really, the chain seen by qemu-img is
5346ebcd-80c6-4696-b376-1fd16cff6d34 -> a9d83317-fefa-4182-9f33-a443628d773f
5346ebcd-80c6-4696-b376-1fd16cff6d34 -> 18c43c63-6610-4aca-80d7-d5139f536e68

[root@localhost images]# qemu-img info 5d407f5f-a758-4b60-9633-c41285ac024c/a9d83317-fefa-4182-9f33-a443628d773f
image: 5d407f5f-a758-4b60-9633-c41285ac024c/a9d83317-fefa-4182-9f33-a443628d773f
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 0
cluster_size: 65536
backing file: 5346ebcd-80c6-4696-b376-1fd16cff6d34 (actual path: 5d407f5f-a758-4b60-9633-c41285ac024c/5346ebcd-80c6-4696-b376-1fd16cff6d34)
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

[root@localhost images]# qemu-img info 5d407f5f-a758-4b60-9633-c41285ac024c/18c43c63-6610-4aca-80d7-d5139f536e68
image: 5d407f5f-a758-4b60-9633-c41285ac024c/18c43c63-6610-4aca-80d7-d5139f536e68
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 0
cluster_size: 65536
backing file: 5346ebcd-80c6-4696-b376-1fd16cff6d34 (actual path: 5d407f5f-a758-4b60-9633-c41285ac024c/5346ebcd-80c6-4696-b376-1fd16cff6d34)
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

[root@localhost images]# qemu-img info 5d407f5f-a758-4b60-9633-c41285ac024c/5346ebcd-80c6-4696-b376-1fd16cff6d34
image: 5d407f5f-a758-4b60-9633-c41285ac024c/5346ebcd-80c6-4696-b376-1fd16cff6d34
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 1 Nir Soffer 2020-02-12 19:49:21 UTC
First, when we talk about chains lets use the same notation as used in
vdsm logs, otherwise it is huge mess. We always show base at the left
and top at the right, and the arrows point from child to parent.

Based on the vm xml the actual chain is:

    5346ebcd-80c6-4696-b376-1fd16cff6d34 <- a9d83317-fefa-4182-9f33-a443628d773f (top)


    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/75bcadd7-d61f-4c53-ad0e-259460d8b10d/images/5d407f5f-a758-4b60-9633-c41285ac024c/a9d83317-fefa-4182-9f33-a443628d773f'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore type='block' index='1'>
        <format type='qcow2'/>
        <source dev='/rhev/data-center/mnt/blockSD/75bcadd7-d61f-4c53-ad0e-259460d8b10d/images/5d407f5f-a758-4b60-9633-c41285ac024c/5346ebcd-80c6-4696-b376-1fd16cff6d34'/>
        <backingStore/>
      </backingStore>
      <target dev='sda' bus='scsi'/>
      <serial>5d407f5f-a758-4b60-9633-c41285ac024c</serial>
      <alias name='ua-5d407f5f-a758-4b60-9633-c41285ac024c'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

The output of qemu-img info in comment 0 is not very helpful, can you attach output of:

   qemu-img info --backing-chain /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-a443628d773f


Based on lv tags, the chain is:

    5346ebcd-80c6-4696-b376-1fd16cff6d34 <- 18c43c63-6610-4aca-80d7-d5139f536e68 <- a9d83317-fefa-4182-9f33-a443628d773f (top)

This is also matched by engine database.

So I guess what vdsm and engine show is the original chain, and we had a successful
merge, so qemu chain was updated.

Note that vdsm does not update the lv tags when the merge completes, since
lv tags can be change only on the spm. Therefore vdsm updates the parent uuid
only in the volume metadata, which is missing here.

Please extract the volume metadata using the md slots MD_1, MD_5, MD_8.

Comment 2 Nir Soffer 2020-02-12 19:54:52 UTC
Also, where is the log from the test tool?
https://github.com/vjuranek/ovirt-bug-reproducers/blob/master/BZ1796415/snapshots.py

What was the last successful step? Did we have any error in the test script?

Comment 3 Nir Soffer 2020-02-12 20:41:17 UTC
Looking in vdsm log, the last merge seen is:

2020-02-07 07:01:26,768-0500 INFO  (jsonrpc/7) [api.virt] START merge(drive={u'imageID': u'5d407f5f-a758-4b60-9633-c41285ac024c', u'volumeID': u'a9d83317-fefa-4182-9f33-a443
628d773f', u'domainID': u'75bcadd7-d61f-4c53-ad0e-259460d8b10d', u'poolID': u'0b600aa6-42a1-11ea-b6bf-525400f551af'}, baseVolUUID=u'5346ebcd-80c6-4696-b376-1fd16cff6d34', to
pVolUUID=u'18c43c63-6610-4aca-80d7-d5139f536e68', bandwidth=u'0', jobUUID=u'a78c0528-bdd2-4389-b186-2cac7dbeb6d6') from=::ffff:192.168.122.123,43428, flow_id=e3f8a7ff-2c46-4
0de-b205-8b9bcc15e3b8, vmId=a5a486b6-4ccb-40fb-9c47-ca1885cd87a9 (api:48)

This merge fails immediately with:

2020-02-07 07:01:26,786-0500 ERROR (jsonrpc/7) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') merge: Cannot find volume 18c43c63-6610-4aca-80d7-d5139f536e68 in drive sdb's volume chain (vm:5872)

Since the volume is not in the chain at this point.

The previous merge with same topVolUUID was at:

2020-02-07 06:26:07,303-0500 INFO  (jsonrpc/2) [api.virt] START merge(drive={u'imageID': u'5d407f5f-a758-4b60-9633-c41285ac024c', u'volumeID': u'a9d83317-fefa-4182-9f33-a443
628d773f', u'domainID': u'75bcadd7-d61f-4c53-ad0e-259460d8b10d', u'poolID': u'0b600aa6-42a1-11ea-b6bf-525400f551af'}, baseVolUUID=u'5346ebcd-80c6-4696-b376-1fd16cff6d34', to
pVolUUID=u'18c43c63-6610-4aca-80d7-d5139f536e68', bandwidth=u'0', jobUUID=u'7a47233e-4ec1-4827-b1b5-1ef8c156fecb') from=::ffff:192.168.122.123,43428, flow_id=dc6b111a-6d5a-4
893-a312-115d89b0095b, vmId=a5a486b6-4ccb-40fb-9c47-ca1885cd87a9 (api:48)

Following the merge flow using the jobUUID prefix:


$ xzgrep 7a47233e vdsm.log.xz

2020-02-07 06:26:07,303-0500 INFO  (jsonrpc/2) [api.virt] START merge(drive={u'imageID': u'5d407f5f-a758-4b60-9633-c41285ac024c', u'volumeID': u'a9d83317-fefa-4182-9f33-a443628d773f', u'domainID': u'75bcadd7-d61f-4c53-ad0e-259460d8b10d', u'poolID': u'0b600aa6-42a1-11ea-b6bf-525400f551af'}, baseVolUUID=u'5346ebcd-80c6-4696-b376-1fd16cff6d34', topVolUUID=u'18c43c63-6610-4aca-80d7-d5139f536e68', bandwidth=u'0', jobUUID=u'7a47233e-4ec1-4827-b1b5-1ef8c156fecb') from=::ffff:192.168.122.123,43428, flow_id=dc6b111a-6d5a-4893-a312-115d89b0095b, vmId=a5a486b6-4ccb-40fb-9c47-ca1885cd87a9 (api:48)

2020-02-07 06:26:07,376-0500 INFO  (jsonrpc/2) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Starting merge with jobUUID=u'7a47233e-4ec1-4827-b1b5-1ef8c156fecb', original chain=5346ebcd-80c6-4696-b376-1fd16cff6d34 < 18c43c63-6610-4aca-80d7-d5139f536e68 < a9d83317-fefa-4182-9f33-a443628d773f (top), disk='sdb', base='sdb[2]', top='sdb[1]', bandwidth=0, flags=8 (vm:5942)

2020-02-07 06:26:07,428-0500 DEBUG (jsonrpc/2) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Checking job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb (vm:5739)

2020-02-07 06:26:07,436-0500 DEBUG (jsonrpc/2) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb live info: {'end': 1073741824L, 'bandwidth': 0L, 'type': 3, 'cur': 1114112L} (vm:5789)

2020-02-07 06:26:08,741-0500 DEBUG (periodic/3) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Checking job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb (vm:5739)


The libvirt job was gone here:

2020-02-07 06:26:09,175-0500 INFO  (periodic/3) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Libvirt job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb was terminated (vm:5798)

So we started a cleanup thread:

2020-02-07 06:26:09,175-0500 INFO  (periodic/3) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Starting cleanup thread for job: 7a47233e-4ec1-4827-b1b5-1ef8c156fecb (vm:5806)

But note that the libvirt BLOCK_JOB event completion event was received only here:

2020-02-07 06:26:09,349-0500 INFO  (libvirt/events) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Block job COMMIT for drive /rhev/data-center/mnt/blockSD/75bcadd7-d61f-4c53-ad0e-259460d8b10d/images/5d407f5f-a758-4b60-9633-c41285ac024c/a9d83317-fefa-4182-9f33-a443628d773f has completed (vm:5835)


So following code run before libvirt submitted the completed event, and may
have see incorrect chain when getting the chain from libvirt.


2020-02-07 06:26:09,232-0500 INFO  (merge/7a47233e) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Synchronizing volume chain after live merge (job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb) (vm:6251)


Here we see that libvirt chain still includes the volume 18c43c63-6610-4aca-80d7-d5139f536e68.

2020-02-07 06:26:09,329-0500 DEBUG (merge/7a47233e) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') vdsm chain: ['5346ebcd-80c6-4696-b376-1fd16cff6d34', '18c43c63-6610-4aca-80d7-d5139f536e68', 'a9d83317-fefa-4182-9f33-a443628d773f'], libvirt chain: ['5346ebcd-80c6-4696-b376-1fd16cff6d34', '18c43c63-6610-4aca-80d7-d5139f536e68', 'a9d83317-fefa-4182-9f33-a443628d773f'] (vm:6026)


So imageSyncVolumeChain did not update vdsm metadata - correctly. 

2020-02-07 06:26:09,330-0500 INFO  (merge/7a47233e) [vdsm.api] START imageSyncVolumeChain(sdUUID=u'75bcadd7-d61f-4c53-ad0e-259460d8b10d', imgUUID=u'5d407f5f-a758-4b60-9633-c41285ac024c', volUUID=u'a9d83317-fefa-4182-9f33-a443628d773f', newChain=['5346ebcd-80c6-4696-b376-1fd16cff6d34', '18c43c63-6610-4aca-80d7-d5139f536e68', 'a9d83317-fefa-4182-9f33-a443628d773f']) from=internal, task_id=1211d26c-19c4-426a-a839-71df0c6db424 (api:48)

2020-02-07 06:26:09,629-0500 INFO  (merge/7a47233e) [storage.Image] Current chain=5346ebcd-80c6-4696-b376-1fd16cff6d34 < 18c43c63-6610-4aca-80d7-d5139f536e68 < a9d83317-fefa-4182-9f33-a443628d773f (top)  (image:1221)

2020-02-07 06:26:09,630-0500 INFO  (merge/7a47233e) [vdsm.api] FINISH imageSyncVolumeChain return=None from=internal, task_id=1211d26c-19c4-426a-a839-71df0c6db424 (api:54)

2020-02-07 06:26:09,631-0500 INFO  (merge/7a47233e) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Synchronization completed (job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb) (vm:6260)

2020-02-07 06:26:23,742-0500 INFO  (periodic/4) [virt.vm] (vmId='a5a486b6-4ccb-40fb-9c47-ca1885cd87a9') Cleanup thread <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fcba419e3d0> successfully completed, untracking job 7a47233e-4ec1-4827-b1b5-1ef8c156fecb (base=5346ebcd-80c6-4696-b376-1fd16cff6d34, top=18c43c63-6610-4aca-80d7-d5139f536e68) (vm:5749)


So I think what happens is:

- Vdsm assumes that once libvirt block job does not exist, the block job
  is done, and getting the volume chain from libvirt using the vm xml will
  show the correct chain

- Libvirt does not show the correct chain at this point.

- After libvirt send the BLOCK_JOB completion event, it shows a correct 
  backing chain.

Vdsm is not using the completion event for detecting completion, since events
are not reliable. If vdsm is restarted during the merge, it can loose the event.

So vdsm is using the existence of the libvirt job as a reliable way to detect
that merge was completed or failed, and then check the chain to understand the
result of the merge.

Peter, can we change libvirt so block job is removed only after the backing
chain is modified?

A possible solution is to keep the block job around in special "completed" state,
and remove it only after the completion event is sent, or after some delay
(e.g. 1 second).

Comment 4 Peter Krempa 2020-02-13 09:08:36 UTC
(In reply to Nir Soffer from comment #3)

> So I think what happens is:
> 
> - Vdsm assumes that once libvirt block job does not exist, the block job
>   is done, and getting the volume chain from libvirt using the vm xml will
>   show the correct chain
> 
> - Libvirt does not show the correct chain at this point.
> 
> - After libvirt send the BLOCK_JOB completion event, it shows a correct 
>   backing chain.

The official statement would be to wait for the block job completion event.

> Vdsm is not using the completion event for detecting completion, since events
> are not reliable. If vdsm is restarted during the merge, it can loose the
> event.

How are they not reliable? Checking the job info only on restart would be far more reliable.

> So vdsm is using the existence of the libvirt job as a reliable way to detect
> that merge was completed or failed, and then check the chain to understand
> the
> result of the merge.
>
> 
> Peter, can we change libvirt so block job is removed only after the backing
> chain is modified?

With pre-blockdev qemu/libvirt combination qemu doesn't report any state, so the API returns nothing because qemu doesn't report the job while libvirt didn't finish the processing yet.

This is fixed with libvirt-5.10 and qemu-4.2 where we ask qemu to keep the job around and until it's processed by the event handler which then also fires the BLOCK_JOB_COMPLETE libvirt event, so the job api returns the job until it's done processing it. The job state more or less reports proper progress statistics.

> A possible solution is to keep the block job around in special "completed"
> state,

With pre-blockdev-era we could have the job, but there's no sane way to report any useful progress information at that point so the data reported to satisfy semantics of the API would be plain garbage.

I'm not sure though how far such thing will be backportable. It depends on internal infrastructure which was done for blockdev support and that might be too complex to backport.

> and remove it only after the completion event is sent, or after some delay
> (e.g. 1 second).

Delaying it arbitrarily is not acceptable.

Comment 5 Vojtech Juranek 2020-02-13 11:40:06 UTC
> The output of qemu-img info in comment 0 is not very helpful, can you attach
> output of:
> 
>    qemu-img info --backing-chain
> /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-
> a443628d773f
> 

    [root@localhost ~]# qemu-img info --backing-chain /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-a443628d773f
image: /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-a443628d773f
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 0
cluster_size: 65536
backing file: 5346ebcd-80c6-4696-b376-1fd16cff6d34 (actual path: /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/5346ebcd-80c6-4696-b376-1fd16cff6d34)
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

image: /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/5346ebcd-80c6-4696-b376-1fd16cff6d34
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false



> Please extract the volume metadata using the md slots MD_1, MD_5, MD_8.


# dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1 skip=$((1024 * 1024 + 1 * 8192)) iflag=direct,skip_bytes
CAP=1073741824
CTIME=1581074755
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=75bcadd7-d61f-4c53-ad0e-259460d8b10d
FORMAT=COW
GEN=0
IMAGE=5d407f5f-a758-4b60-9633-c41285ac024c
LEGALITY=LEGAL
PUUID=18c43c63-6610-4aca-80d7-d5139f536e68
TYPE=SPARSE
VOLTYPE=LEAF
EOF


# dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1 skip=$((1024 * 1024 + 5 * 8192)) iflag=direct,skip_bytes
CAP=1073741824
CTIME=1581074658
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=75bcadd7-d61f-4c53-ad0e-259460d8b10d
FORMAT=COW
GEN=0
IMAGE=5d407f5f-a758-4b60-9633-c41285ac024c
LEGALITY=LEGAL
PUUID=5346ebcd-80c6-4696-b376-1fd16cff6d34
TYPE=SPARSE
VOLTYPE=INTERNAL
EOF

# dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1 skip=$((1024 * 1024 + 8 * 8192)) iflag=direct,skip_bytes
CAP=1073741824
CTIME=1581028699
DESCRIPTION={"DiskAlias":"test-cow","DiskDescription":""}
DISKTYPE=DATA
DOMAIN=75bcadd7-d61f-4c53-ad0e-259460d8b10d
FORMAT=COW
GEN=0
IMAGE=5d407f5f-a758-4b60-9633-c41285ac024c
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=INTERNAL
EOF


> Also, where is the log from the test tool?
> https://github.com/vjuranek/ovirt-bug-reproducers/blob/master/BZ1796415/snapshots.py
> What was the last successful step? Did we have any error in the test script?

unfortunately I didn't save it. But there wasn't anything interesting, there were no errors, it hanged waiting for the snapshost to be deleted

Comment 6 Nir Soffer 2020-02-13 12:02:21 UTC
(In reply to Peter Krempa from comment #4)
> (In reply to Nir Soffer from comment #3)
> > Vdsm is not using the completion event for detecting completion, since events
> > are not reliable. If vdsm is restarted during the merge, it can loose the
> > event.
> 
> How are they not reliable? 

You can miss an event if vdsm is restarted at the right time. And when you 
restart, the existence of the job your only hint about the job.

We started to use events only last month to make it easier to debug issue.
I think we can use them to complement the job monitoring code.

> > So vdsm is using the existence of the libvirt job as a reliable way to detect
> > that merge was completed or failed, and then check the chain to understand
> > the
> > result of the merge.
> >
> > 
> > Peter, can we change libvirt so block job is removed only after the backing
> > chain is modified?
> 
> With pre-blockdev qemu/libvirt combination

Yes, this is a bout libvirt in RHEL 7.7 (4.5?)

> qemu doesn't report any state, so
> the API returns nothing because qemu doesn't report the job while libvirt
> didn't finish the processing yet.

What do you mean not reporting the job?

We assume that the libvirt block jobs are reported in virtDomainBlockJobInfo after
virDomainBlockCommit was called, and that not job which is not reported by libvirt
will never be reported again.

We also assume that libvirt xml and block jobs are synced; if block job does not
exist because a volume was removed from the chain in qemu, getting libvirt xml
at this point will not show the volume in the backing chain.

Are these assumptions correct for libvirt 4.5? libvirt 6.0?

Comment 7 Nir Soffer 2020-02-13 12:08:20 UTC
(In reply to Vojtech Juranek from comment #5)
> > The output of qemu-img info in comment 0 is not very helpful, can you attach
> > output of:
> > 
> >    qemu-img info --backing-chain
> > /dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/a9d83317-fefa-4182-9f33-
> > a443628d773f
...

Makes sense.

> > Please extract the volume metadata using the md slots MD_1, MD_5, MD_8.
> 
> 
> # dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1
> skip=$((1024 * 1024 + 1 * 8192)) iflag=direct,skip_bytes
...
> PUUID=18c43c63-6610-4aca-80d7-d5139f536e68
> VOLTYPE=LEAF
...

> # dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1
> skip=$((1024 * 1024 + 5 * 8192)) iflag=direct,skip_bytes
...
> PUUID=5346ebcd-80c6-4696-b376-1fd16cff6d34
> VOLTYPE=INTERNAL
...

> # dd if=/dev/75bcadd7-d61f-4c53-ad0e-259460d8b10d/metadata bs=512 count=1
> skip=$((1024 * 1024 + 8 * 8192)) iflag=direct,skip_bytes
...
> PUUID=00000000-0000-0000-0000-000000000000
> VOLTYPE=INTERNAL
...

Matches what we see in the logs about vdsm chain and libvirt chain. Vdsm
correctly did not update the chain.

> > Also, where is the log from the test tool?
> > https://github.com/vjuranek/ovirt-bug-reproducers/blob/master/BZ1796415/snapshots.py
> > What was the last successful step? Did we have any error in the test script?
> 
> unfortunately I didn't save it. But there wasn't anything interesting, there
> were no errors, it hanged waiting for the snapshost to be deleted

We probably need to improve the test script to detect that the snapshot delete
operation failed, instead of waiting forever.

We also need an example for the SDK showing how to do this properly.

Comment 8 Peter Krempa 2020-02-13 12:21:41 UTC
(In reply to Nir Soffer from comment #6)
> (In reply to Peter Krempa from comment #4)
> > (In reply to Nir Soffer from comment #3)

[...]

> > > Peter, can we change libvirt so block job is removed only after the backing
> > > chain is modified?
> > 
> > With pre-blockdev qemu/libvirt combination
> 
> Yes, this is a bout libvirt in RHEL 7.7 (4.5?)

Hmm, I can't guarantee that it will be possible to fix that version.

> > qemu doesn't report any state, so
> > the API returns nothing because qemu doesn't report the job while libvirt
> > didn't finish the processing yet.
> 
> What do you mean not reporting the job?
> 
> We assume that the libvirt block jobs are reported in virtDomainBlockJobInfo
> after
> virDomainBlockCommit was called, and that not job which is not reported by
> libvirt
> will never be reported again.

I don't quite understand what you meant.

Anyways, the old control flow is:

1) block-commit is started in qemu, it's state can be tracked via virtDomainBlockJobInfo
2) block-commit finishes, QEMU fires an event, 

Insert race here:
using query-block-jobs which is used by virtDomainBlockJobInfo returns nothing as the job finished from qemu's PoV and thus also virtDomainBlockJobInfo reports nothing

3) libvirt processes the event produced by qemu and updates it's state

(every step above is considered atomic)

With the new control flow we ask qemu to keep the job around until step 3 is reached, so you either see the job state or the updated XML.

 
> We also assume that libvirt xml and block jobs are synced; if block job does
> not
> exist because a volume was removed from the chain in qemu, getting libvirt
> xml
> at this point will not show the volume in the backing chain.
> 
> Are these assumptions correct for libvirt 4.5? libvirt 6.0?

For libvirt 4.5 they are correct only if you take the event as the end of the job. The fact that virtDomainBlockJobInfo doesn't return anything does not necessarily mean libvirt finished processing the XML update.

That is fixed with libvirt-6 and qemu-4.2 because we ask qemu to keep the job along until we start processing it (so that from the API point of view the update is atomic) so that we can even report the current job state.

Comment 9 Nir Soffer 2020-02-13 12:29:47 UTC
(In reply to Peter Krempa from comment #8)
Based on Peter response, RHV 4.4 should be fine on RHEL 8.2 without change.

If we want to fix this race in RHV 4.3, we need to wait for the block job
event before we continue with job cleanup.

If we don't get the block job completion event (maybe vdsm was just restarted)
we need to time out after reasonable timeout and continue cleanup as done
in current code.

The attached patches are the first step for such fix.

Germano, do we have cases with this issue? How important is to fix this in 4.3?

Comment 13 Nir Soffer 2020-02-16 13:59:55 UTC
Moving back to New. The bug status was changed by broken CI scripts.

Comment 15 Vojtech Juranek 2020-05-12 13:12:34 UTC
*** Bug 1814318 has been marked as a duplicate of this bug. ***

Comment 16 Evelina Shames 2020-09-24 05:49:01 UTC
Created attachment 1716204 [details]
logs

We saw it again in our automation tests.

Versions:
ovirt-engine-4.3.11.4-0.1.el7.noarch
vdsm-4.30.50-1.el7ev.x86_64
(rhv-4.3.11-8)

Attaching relevant logs.

Comment 17 Michal Skrivanek 2021-08-20 06:34:37 UTC
does it reproduce in 4.4? As per previous comments it should not, so once we confirm we can close this bug as fixed

Comment 18 Avihai 2021-08-23 10:33:14 UTC
(In reply to Michal Skrivanek from comment #17)
> does it reproduce in 4.4? As per previous comments it should not, so once we
> confirm we can close this bug as fixed

Comment 19 Avihai 2021-08-23 10:34:47 UTC
(In reply to Avihai from comment #18)
> (In reply to Michal Skrivanek from comment #17)
> > does it reproduce in 4.4? As per previous comments it should not, so once we
> > confirm we can close this bug as fixed

Evelina, did we encounter this issue in 4.4 runs?

Comment 20 Evelina Shames 2021-08-25 07:20:19 UTC
(In reply to Avihai from comment #19)
> (In reply to Avihai from comment #18)
> > (In reply to Michal Skrivanek from comment #17)
> > > does it reproduce in 4.4? As per previous comments it should not, so once we
> > > confirm we can close this bug as fixed
> 
> Evelina, did we encounter this issue in 4.4 runs?

No, we can closed this as fixed.

Comment 21 Vojtech Juranek 2021-08-26 10:55:27 UTC
removing remaining need info


Note You need to log in before you can comment on or make changes to this bug.