Bug 1507994

Summary: Failed to remove single disk from snapshot while vm is running
Product: [oVirt] ovirt-engine Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: amureini, bugs, kgoldbla
Target Milestone: ovirt-4.2.0Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-20 11:16:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1481688    
Attachments:
Description Flags
vdsm, server, engine logs none

Description Kevin Alon Goldblatt 2017-10-31 16:13:43 UTC
Description of problem:
Deleting a single disk from a snapshot with vm up fails

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20171029154613.git19686f3.el7.centos.noarch
vdsm-4.20.5-5.git3d47978.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with 2 disks, create snapshot
2. Start vm, write data, create 2nd snapshot
3. Write data, create 3rd snapshot
4. Via Webadmin->Storage Domains->Hyperconv Gluster domain->Disk Snapshots->select snapshot disk from 2nd snapshot and press Remove >>>>> Fails with "Failed to complete deletion of Disk 'clonefromtp1_Disk1' from snapshot(s) 'snaa3' of VM 'clonefromtp1' (User: admin@internal-authz)"

Actual results:
Fail to delete disk snapshot from live vm

Expected results:
Should work

Additional info:
From engine logs:
-------------------------
2017-10-31 17:55:22,727+02 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Failed to live merge, still in volume chain: [4e01405c-80fd-40d4-b5aa-57ec08f23dc1, d566b684-1499-4afc-b74d-405102591aee]
2017-10-31 17:55:23,395+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] START, GlusterServersListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 651e0bc7
2017-10-31 17:55:23,581+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command id: '74055a27-4935-446d-a7ea-75855cd8209d failed child command status for step 'MERGE_STATUS'
2017-10-31 17:55:23,581+02 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command 'RemoveSnapshotSingleDiskLive' id: '74055a27-4935-446d-a7ea-75855cd8209d' child commands '[9d2a47b9-6d00-4a72-9178-890d415345fc, cb9a7cff-a704-4499-b344-052a54456ca6, fe05e4e0-bb07-476a-9bb7-dd58bc8f52de]' executions were completed, status 'FAILED'
2017-10-31 17:55:24,393+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] FINISH, GlusterServersListVDSCommand, return: [10.35.83.164/23:CONNECTED, vm-83-162.scl.lab.tlv.redhat.com:CONNECTED, vm-83-163.scl.lab.tlv.redhat.com:CONNECTED], log id: 651e0bc7
2017-10-31 17:55:24,403+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] START, GlusterVolumesListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, GlusterVolumesListVDSParameters:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 2f8d8eea
2017-10-31 17:55:24,679+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Merging of snapshot 'beaa2b2e-a8dc-4b58-ac0b-a29eaebafe35' images '4e01405c-80fd-40d4-b5aa-57ec08f23dc1'..'d566b684-1499-4afc-b74d-405102591aee' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2017-10-31 17:55:24,719+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2017-10-31 17:55:24,806+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command 'RemoveDiskSnapshots' id: '99cd1c4d-09cf-4279-a571-a431afd18126' child commands '[74055a27-4935-446d-a7ea-75855cd8209d]' executions were completed, status 'FAILED'
2017-10-31 17:55:24,953+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] FINISH, GlusterVolumesListVDSCommand, return: {12548706-4031-485f-b19f-8c0d651add1f=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@6dbb9f9a}, log id: 2f8d8eea
2017-10-31 17:55:25,959+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand' with failure.
2017-10-31 17:55:26,270+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] EVENT_ID: USER_REMOVE_DISK_SNAPSHOT_FINISHED_FAILURE(376), Failed to complete deletion of Disk 'clonefromtp1_Disk1' from snapshot(s) 'snaa3' of VM 'clonefromtp1' (User: admin@internal-authz).
2017-10-31 17:55:29,989+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler4) [340c9e07] START, GlusterServersListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 4fa79dac
2017-10-31 17:55:31,030+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler4) [340c9e07] FINISH, GlusterServersListVDSCommand, return: [10.35.83.164/23:CONNECTED, vm-83-162.scl.lab.tlv.redhat.com:CONNECTED, vm-83-163.scl.lab.tlv.redhat.com:CONNECTED], log id: 4fa79dac

Comment 1 Kevin Alon Goldblatt 2017-10-31 16:20:32 UTC
Created attachment 1346001 [details]
vdsm, server, engine logs

Added logs

Comment 2 Allon Mureinik 2017-10-31 16:29:57 UTC
The log is full of domain monitoring errors.

Does this happen on non HCI environments? On other HCI environments?

Comment 3 Kevin Alon Goldblatt 2017-11-05 14:14:17 UTC
(In reply to Allon Mureinik from comment #2)
> The log is full of domain monitoring errors.
> 
> Does this happen on non HCI environments? On other HCI environments?



It works on regular environment

Comment 4 Ala Hino 2017-11-09 10:53:03 UTC
@Kevin,

Did you perform any disk move?
Following vdsm error shows that disk replication fails to start:

2017-10-31 17:20:33,898+0200 ERROR (jsonrpc/7) [virt.vm] (vmId='7330e919-62bc-4e3f-990b-111e2fcbf87d') Unable to start replication for sda to {u'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a163
0821', 'volumeInfo': {'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'protocol': 'gluster', 'ho
sts': [{'port': '0', 'transport': 'tcp', 'name': '10.35.83.240'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.241'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.242'}]}, 'for
mat': 'cow', u'poolID': u'96cc313a-67d4-49e8-8979-27d3fcfd171c', u'device': 'disk', 'protocol': 'gluster', 'propagateErrors': 'off', u'diskType': u'network', 'cache': 'none', u'volumeID': u'
2b886bc8-cece-470b-950a-0c3b8da5315f', u'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f', 'hosts': [{'port': '0', 'transport': 'tcp', 'name': '10.35.83.240'}], 'path': u'glust1_rep3/01bdd2
ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'volumeChain': [{'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'lease
Offset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/4e01405c-80fd-40d4-b5aa-57ec08f23dc1', 'volumeID': u'4e01405c-80fd-40d4-b5a
a-57ec08f23dc1', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/4e01405c-80fd-40d4
-b5aa-57ec08f23dc1.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e
4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'volumeID': u'2b886bc8-cece-470b-950a-0c3b8da5315f', 'leasePath': u'/rhev/data-center/
mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f.lease', 'imageID': u'db4120d3-d
a1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab6
7-8c53c2bbe62f/533d83bc-1473-4bc3-bb3f-41254eb786c0', 'volumeID': u'533d83bc-1473-4bc3-bb3f-41254eb786c0', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-
f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/533d83bc-1473-4bc3-bb3f-41254eb786c0.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd
2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b0b5b2f-a015-4593-9e59-e22667b9009
c', 'volumeID': u'2b0b5b2f-a015-4593-9e59-e22667b9009c', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4
297-ab67-8c53c2bbe62f/2b0b5b2f-a015-4593-9e59-e22667b9009c.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0
, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/d566b684-1499-4afc-b74d-405102591aee', 'volumeID': u'd566b684-1499-4afc-b74d-40510259
1aee', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/d566b684-1499-4afc-b74d-4051
02591aee.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}]} (vm:4356)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4350, in diskReplicateStart
    self._startDriveReplication(drive)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4479, in _startDriveReplication
    self._dom.blockCopy(drive.name, destxml, flags=flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 688, in blockCopy
    if ret == -1: raise libvirtError ('virDomainBlockCopy() failed', dom=self)
libvirtError: argument unsupported: non-file destination not supported yet

Comment 5 Allon Mureinik 2017-11-09 11:56:16 UTC
(In reply to Ala Hino from comment #4)
> @Kevin,
> 
> Did you perform any disk move?
> Following vdsm error shows that disk replication fails to start:

This underlying issue was solved by Denis' 2c03d2307b8da5fb6fc6eac026ea554beefc2dc2 merged on October 30th (which was not available when this bug was reported)
Please retest with a newer engine build that contains the fix.

Comment 6 Kevin Alon Goldblatt 2017-11-27 15:23:38 UTC
Verified with the following code:
----------------------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.7-55.git11440d6.el7.centos.x86_64


Verified with the following scenario:
----------------------------------------------
Steps to Reproduce:
1. Create a vm with 2 disks, create snapshot
2. Start vm, write data, create 2nd snapshot
3. Write data, create 3rd snapshot
4. Via Webadmin->Storage Domains->Hyperconv Gluster domain->Disk Snapshots->select snapshot disk from 2nd snapshot and press Remove >>>>> The disk snapshot is successfully removed

Comment 7 Sandro Bonazzola 2017-12-20 11:16:34 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.