Bug 1507994 - Failed to remove single disk from snapshot while vm is running
Summary: Failed to remove single disk from snapshot while vm is running
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Ala Hino
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard: storage
Depends On:
Blocks: 1481688
TreeView+ depends on / blocked
 
Reported: 2017-10-31 16:13 UTC by Kevin Alon Goldblatt
Modified: 2017-12-20 11:16 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:16:34 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
vdsm, server, engine logs (2.92 MB, application/x-gzip)
2017-10-31 16:20 UTC, Kevin Alon Goldblatt
no flags Details

Description Kevin Alon Goldblatt 2017-10-31 16:13:43 UTC
Description of problem:
Deleting a single disk from a snapshot with vm up fails

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.0-0.0.master.20171029154613.git19686f3.el7.centos.noarch
vdsm-4.20.5-5.git3d47978.el7.centos.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with 2 disks, create snapshot
2. Start vm, write data, create 2nd snapshot
3. Write data, create 3rd snapshot
4. Via Webadmin->Storage Domains->Hyperconv Gluster domain->Disk Snapshots->select snapshot disk from 2nd snapshot and press Remove >>>>> Fails with "Failed to complete deletion of Disk 'clonefromtp1_Disk1' from snapshot(s) 'snaa3' of VM 'clonefromtp1' (User: admin@internal-authz)"

Actual results:
Fail to delete disk snapshot from live vm

Expected results:
Should work

Additional info:
From engine logs:
-------------------------
2017-10-31 17:55:22,727+02 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Failed to live merge, still in volume chain: [4e01405c-80fd-40d4-b5aa-57ec08f23dc1, d566b684-1499-4afc-b74d-405102591aee]
2017-10-31 17:55:23,395+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] START, GlusterServersListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 651e0bc7
2017-10-31 17:55:23,581+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command id: '74055a27-4935-446d-a7ea-75855cd8209d failed child command status for step 'MERGE_STATUS'
2017-10-31 17:55:23,581+02 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command 'RemoveSnapshotSingleDiskLive' id: '74055a27-4935-446d-a7ea-75855cd8209d' child commands '[9d2a47b9-6d00-4a72-9178-890d415345fc, cb9a7cff-a704-4499-b344-052a54456ca6, fe05e4e0-bb07-476a-9bb7-dd58bc8f52de]' executions were completed, status 'FAILED'
2017-10-31 17:55:24,393+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] FINISH, GlusterServersListVDSCommand, return: [10.35.83.164/23:CONNECTED, vm-83-162.scl.lab.tlv.redhat.com:CONNECTED, vm-83-163.scl.lab.tlv.redhat.com:CONNECTED], log id: 651e0bc7
2017-10-31 17:55:24,403+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] START, GlusterVolumesListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, GlusterVolumesListVDSParameters:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 2f8d8eea
2017-10-31 17:55:24,679+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Merging of snapshot 'beaa2b2e-a8dc-4b58-ac0b-a29eaebafe35' images '4e01405c-80fd-40d4-b5aa-57ec08f23dc1'..'d566b684-1499-4afc-b74d-405102591aee' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2017-10-31 17:55:24,719+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' with failure.
2017-10-31 17:55:24,806+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-97) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Command 'RemoveDiskSnapshots' id: '99cd1c4d-09cf-4279-a571-a431afd18126' child commands '[74055a27-4935-446d-a7ea-75855cd8209d]' executions were completed, status 'FAILED'
2017-10-31 17:55:24,953+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler5) [5cf39523] FINISH, GlusterVolumesListVDSCommand, return: {12548706-4031-485f-b19f-8c0d651add1f=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@6dbb9f9a}, log id: 2f8d8eea
2017-10-31 17:55:25,959+02 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveDiskSnapshotsCommand' with failure.
2017-10-31 17:55:26,270+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [a527abf0-ae22-4083-ac96-ee3d8cbcce3a] EVENT_ID: USER_REMOVE_DISK_SNAPSHOT_FINISHED_FAILURE(376), Failed to complete deletion of Disk 'clonefromtp1_Disk1' from snapshot(s) 'snaa3' of VM 'clonefromtp1' (User: admin@internal-authz).
2017-10-31 17:55:29,989+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler4) [340c9e07] START, GlusterServersListVDSCommand(HostName = vm-83-164.scl.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{hostId='265056d7-a5dc-4e16-baae-2db73c51e714'}), log id: 4fa79dac
2017-10-31 17:55:31,030+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler4) [340c9e07] FINISH, GlusterServersListVDSCommand, return: [10.35.83.164/23:CONNECTED, vm-83-162.scl.lab.tlv.redhat.com:CONNECTED, vm-83-163.scl.lab.tlv.redhat.com:CONNECTED], log id: 4fa79dac

Comment 1 Kevin Alon Goldblatt 2017-10-31 16:20:32 UTC
Created attachment 1346001 [details]
vdsm, server, engine logs

Added logs

Comment 2 Allon Mureinik 2017-10-31 16:29:57 UTC
The log is full of domain monitoring errors.

Does this happen on non HCI environments? On other HCI environments?

Comment 3 Kevin Alon Goldblatt 2017-11-05 14:14:17 UTC
(In reply to Allon Mureinik from comment #2)
> The log is full of domain monitoring errors.
> 
> Does this happen on non HCI environments? On other HCI environments?



It works on regular environment

Comment 4 Ala Hino 2017-11-09 10:53:03 UTC
@Kevin,

Did you perform any disk move?
Following vdsm error shows that disk replication fails to start:

2017-10-31 17:20:33,898+0200 ERROR (jsonrpc/7) [virt.vm] (vmId='7330e919-62bc-4e3f-990b-111e2fcbf87d') Unable to start replication for sda to {u'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a163
0821', 'volumeInfo': {'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'protocol': 'gluster', 'ho
sts': [{'port': '0', 'transport': 'tcp', 'name': '10.35.83.240'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.241'}, {'port': '0', 'transport': 'tcp', 'name': '10.35.83.242'}]}, 'for
mat': 'cow', u'poolID': u'96cc313a-67d4-49e8-8979-27d3fcfd171c', u'device': 'disk', 'protocol': 'gluster', 'propagateErrors': 'off', u'diskType': u'network', 'cache': 'none', u'volumeID': u'
2b886bc8-cece-470b-950a-0c3b8da5315f', u'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f', 'hosts': [{'port': '0', 'transport': 'tcp', 'name': '10.35.83.240'}], 'path': u'glust1_rep3/01bdd2
ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'volumeChain': [{'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'lease
Offset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/4e01405c-80fd-40d4-b5aa-57ec08f23dc1', 'volumeID': u'4e01405c-80fd-40d4-b5a
a-57ec08f23dc1', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/4e01405c-80fd-40d4
-b5aa-57ec08f23dc1.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e
4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f', 'volumeID': u'2b886bc8-cece-470b-950a-0c3b8da5315f', 'leasePath': u'/rhev/data-center/
mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b886bc8-cece-470b-950a-0c3b8da5315f.lease', 'imageID': u'db4120d3-d
a1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab6
7-8c53c2bbe62f/533d83bc-1473-4bc3-bb3f-41254eb786c0', 'volumeID': u'533d83bc-1473-4bc3-bb3f-41254eb786c0', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-
f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/533d83bc-1473-4bc3-bb3f-41254eb786c0.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd
2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/2b0b5b2f-a015-4593-9e59-e22667b9009
c', 'volumeID': u'2b0b5b2f-a015-4593-9e59-e22667b9009c', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4
297-ab67-8c53c2bbe62f/2b0b5b2f-a015-4593-9e59-e22667b9009c.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}, {'domainID': u'01bdd2ad-f506-4e4d-b1c5-e037a1630821', 'leaseOffset': 0
, 'path': u'glust1_rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/d566b684-1499-4afc-b74d-405102591aee', 'volumeID': u'd566b684-1499-4afc-b74d-40510259
1aee', 'leasePath': u'/rhev/data-center/mnt/glusterSD/10.35.83.240:_glust1__rep3/01bdd2ad-f506-4e4d-b1c5-e037a1630821/images/db4120d3-da1c-4297-ab67-8c53c2bbe62f/d566b684-1499-4afc-b74d-4051
02591aee.lease', 'imageID': u'db4120d3-da1c-4297-ab67-8c53c2bbe62f'}]} (vm:4356)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4350, in diskReplicateStart
    self._startDriveReplication(drive)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4479, in _startDriveReplication
    self._dom.blockCopy(drive.name, destxml, flags=flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 125, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 586, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 688, in blockCopy
    if ret == -1: raise libvirtError ('virDomainBlockCopy() failed', dom=self)
libvirtError: argument unsupported: non-file destination not supported yet

Comment 5 Allon Mureinik 2017-11-09 11:56:16 UTC
(In reply to Ala Hino from comment #4)
> @Kevin,
> 
> Did you perform any disk move?
> Following vdsm error shows that disk replication fails to start:

This underlying issue was solved by Denis' 2c03d2307b8da5fb6fc6eac026ea554beefc2dc2 merged on October 30th (which was not available when this bug was reported)
Please retest with a newer engine build that contains the fix.

Comment 6 Kevin Alon Goldblatt 2017-11-27 15:23:38 UTC
Verified with the following code:
----------------------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.7-55.git11440d6.el7.centos.x86_64


Verified with the following scenario:
----------------------------------------------
Steps to Reproduce:
1. Create a vm with 2 disks, create snapshot
2. Start vm, write data, create 2nd snapshot
3. Write data, create 3rd snapshot
4. Via Webadmin->Storage Domains->Hyperconv Gluster domain->Disk Snapshots->select snapshot disk from 2nd snapshot and press Remove >>>>> The disk snapshot is successfully removed

Comment 7 Sandro Bonazzola 2017-12-20 11:16:34 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.