Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1664338

Summary: Live storage migration fails during diskReplicateStart with VolumeDoesNotExist for the new snapshot created on the target domain
Product: [oVirt] ovirt-engine Reporter: Elad <ebenahar>
Component: BLL.StorageAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED CURRENTRELEASE QA Contact: Yosi Ben Shimon <ybenshim>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aefrat, bugs, tnisan
Target Milestone: ovirt-4.3.5Keywords: Automation
Target Release: 4.3.5Flags: rule-engine: ovirt-4.3+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-30 14:08:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs
none
logs2 none

Description Elad 2019-01-08 14:00:05 UTC
Created attachment 1519196 [details]
engine and vdsm logs

Description of problem:
During live storage migration's diskReplicateStart, got an error for prepareImage that the volume doesn't exist.
Disk move failed.

Version-Release number of selected component (if applicable):
vdsm-4.30.4-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.3.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
sanlock-3.6.0-1.el7.x86_64
kernel - 3.10.0-957.3.1.el7.x86_64
RHEL7.6

How reproducible:
Happened once

Steps to Reproduce:
Happened after the following flow:
1. Create a VM (blank) with a thin cow disk attached on NFS
2. Resize the disk
3. Create a snapshot for the VM with the disk
4. Resize the disk
5. Preview the snapshot
6. Commit the snapshot
7. Start the VM
8. Live migrate the VM's disk



Actual results:
Move disk starts:

2019-01-08 12:39:44,222+02 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: MoveDiskCommand internal: false. Entities affected :  ID: 613de8d9-d366-4acd-a75a-22a2c617196d Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER,  ID: fdd43a70-b9cc-4029-bb4b-4707590b70f7 Type: StorageAction group CREATE_DISK with role type USER



CreateVolume for the place holder on the target domain (as part of create snapshot):


2019-01-08 12:39:45,084+02 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotForVmCommand internal: tr
ue. Entities affected :  ID: e1b826d4-c725-47bd-97e0-e89c1c3b2590 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER
2019-01-08 12:39:45,145+02 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotDiskCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotDiskCommand internal: true
. Entities affected :  ID: e1b826d4-c725-47bd-97e0-e89c1c3b2590 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER
2019-01-08 12:39:45,239+02 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotCommand internal: true. Entiti
es affected :  ID: 00000000-0000-0000-0000-000000000000 Type: Storage
2019-01-08 12:39:45,313+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='2288b8b4-06e6-44cc-8294-3f6ceec565f5', ignoreFailoverLimit='false', storageDomainId='18889b9b-dcab-4542-bdb4-1d878f40657a', imageGroupId='613de8d9-d366-4acd-a75a-22a2c617196d', imageSizeInBytes='11811160064', volumeFormat='COW', newImageId='fdadd8ee-27ff-47e5-a89c-163727d2d934', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='7d66d729-0f3a-41ad-bc63-74792a816c2d', sourceImageGroupId='613de8d9-d366-4acd-a75a-22a2c617196d'}), log id: 30af3cb7



And fails on vdsm during diskReplicateStart:

2019-01-08 12:40:14,102+0200 INFO  (jsonrpc/7) [api.virt] START diskReplicateStart(srcDisk={u'device': u'disk', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2
d934', u'domainID': u'18889b9b-dcab-4542-bdb4-1d878f40657a', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d'}, dstDisk={u'device': u'disk', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'volumeID': u'
fdadd8ee-27ff-47e5-a89c-163727d2d934', u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d'}) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec24
74-5ea8-49c9, vmId=e1b826d4-c725-47bd-97e0-e89c1c3b2590 (api:48)
2019-01-08 12:40:14,141+0200 INFO  (jsonrpc/7) [vdsm.api] START prepareImage(sdUUID=u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', spUUID=u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', imgUUID=u'613de8d9-d366-4acd-a75a-22a2
c617196d', leafUUID=u'fdadd8ee-27ff-47e5-a89c-163727d2d934', allowIllegal=False) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec2474-5ea8-49c9, task_id=3319baf3-1048-4c88-b5da-a6c783ffcaa1 (api:48
)
2019-01-08 12:40:14,157+0200 INFO  (jsonrpc/7) [vdsm.api] FINISH prepareImage error=Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8e
c2474-5ea8-49c9, task_id=3319baf3-1048-4c88-b5da-a6c783ffcaa1 (api:52)
2019-01-08 12:40:14,158+0200 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='3319baf3-1048-4c88-b5da-a6c783ffcaa1') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in prepareImage
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3176, in prepareImage
    raise se.VolumeDoesNotExist(leafUUID)
VolumeDoesNotExist: Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',)
2019-01-08 12:40:14,160+0200 INFO  (jsonrpc/7) [storage.TaskManager.Task] (Task='3319baf3-1048-4c88-b5da-a6c783ffcaa1') aborting: Task is aborted: "Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',)" - code 201 (task:1181)
2019-01-08 12:40:14,162+0200 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',) (dispatcher:81)
2019-01-08 12:40:14,163+0200 ERROR (jsonrpc/7) [virt.vm] (vmId='e1b826d4-c725-47bd-97e0-e89c1c3b2590') Unable to start replication for vda to {u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', 'format': 'cow', 'cache': 'none', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2d934', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'device': 'disk', 'propagateErrors': 'off'} (vm:4558)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4546, in diskReplicateStart
    replica['path'] = self.cif.prepareVolumePath(replica)
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', 'format': 'cow', 'cache': 'none', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2d934', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'device': 'disk', 'propagateErrors': 'off'}
2019-01-08 12:40:14,186+0200 INFO  (jsonrpc/7) [api.virt] FINISH diskReplicateStart return={'status': {'message': 'Drive replication error', 'code': 55}} from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec2474-5ea8-49c9, vmId=e1b826d4-c725-47bd-97e0-e89c1c3b2590 (api:54)


Expected results:
LSM should succeed

Additional info:
The new volume exists after the failure:


[root@storage-ge6-vdsm1 ~]# ll /rhev/data-center/2288b8b4-06e6-44cc-8294-3f6ceec565f5/18889b9b-dcab-4542-bdb4-1d878f40657a/images/613de8d9-d366-4acd-a75a-22a2c617196d/
total 4904
-rw-rw----. 1 vdsm kvm  262352 Jan  8 12:15 627003b6-cfe3-4a47-9c73-37d9c37b61d2
-rw-rw----. 1 vdsm kvm 1048576 Jan  8 12:13 627003b6-cfe3-4a47-9c73-37d9c37b61d2.lease
-rw-r--r--. 1 vdsm kvm     268 Jan  8 12:15 627003b6-cfe3-4a47-9c73-37d9c37b61d2.meta
-rw-rw----. 1 vdsm kvm  196784 Jan  8 12:24 7d66d729-0f3a-41ad-bc63-74792a816c2d
-rw-rw----. 1 vdsm kvm 1048576 Jan  8 12:24 7d66d729-0f3a-41ad-bc63-74792a816c2d.lease
-rw-r--r--. 1 vdsm kvm     272 Jan  8 12:39 7d66d729-0f3a-41ad-bc63-74792a816c2d.meta
-rw-rw----. 1 vdsm kvm  262320 Jan  8 12:08 9a5dd37d-9428-4733-bdf0-deab1af3ffd7
-rw-rw----. 1 vdsm kvm 1048576 Jan  8 11:52 9a5dd37d-9428-4733-bdf0-deab1af3ffd7.lease
-rw-r--r--. 1 vdsm kvm     336 Jan  8 12:13 9a5dd37d-9428-4733-bdf0-deab1af3ffd7.meta
-rw-rw----. 1 vdsm kvm  196784 Jan  8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934
-rw-rw----. 1 vdsm kvm 1048576 Jan  8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934.lease
-rw-r--r--. 1 vdsm kvm     268 Jan  8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934.meta

Comment 1 Elad 2019-01-10 17:51:31 UTC
Created attachment 1519883 [details]
logs2

Encountered again:

2019-01-10 19:10:03,900+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='e471cae3-69fc-47f0-8a48-d44155f192a8') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in prepareImage
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3176, in prepareImage
    raise se.VolumeDoesNotExist(leafUUID)
VolumeDoesNotExist: Volume does not exist: (u'f08c3a6e-18e7-4b11-bd2f-231e364f694a',)
2019-01-10 19:10:03,900+0200 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='e471cae3-69fc-47f0-8a48-d44155f192a8') Task._run: e471cae3-69fc-47f0-8a48-d44155f192a8 (u'4a48ac50-8707-4e22-ae74-35ea0fddd3b2', u'b3f3778b-0a12-49fe-97e6-06bb5072f104', u'94a43837-bdab-4eaa-8de1-869f54051837', u'f08c3a6e-18e7-4b11-bd2f-231e364f694a') {} failed - stopping task (task:894)

Comment 2 Benny Zlotnik 2019-06-04 14:12:32 UTC
So the issue here is that a snapshot that is currently in preview is being cloned to create a VM:

2019-01-10 18:51:32,042+02 INFO  [org.ovirt.engine.core.bll.AddVmFromSnapshotCommand] (default task-143) [vms_create_866e5f95-5892-4f7e] Lock Acquired to object 'EngineLock:{exclusiveLocks='[vm_random_test_1018512639=VM_NAME, 076d098a-7e74-423b-841a-1b17e920f0b4=VM]', sharedLocks=''}'


This is why the commit/undo fails with this error:

2019-01-10 19:04:37,744+02 WARN  [org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand] (default task-143) [vms_syncAction_97379440-645c-4e55] Validation of action 'RestoreAllSnapshots' failed for user admin@internal-authz. Reasons: VAR__ACTION__REVERT_TO,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_VM_SNAPSHOT_DOES_NOT_EXIST
2019-01-10 19:04:37,745+02 INFO  [org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand] (default task-143) [vms_syncAction_97379440-645c-4e55] Lock freed to object 'EngineLock:{exclusiveLocks='[076d098a-7e74-423b-841a-1b17e920f0b4=VM]', sharedLocks=''}'
2019-01-10 19:04:37,757+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-143) [] Operation Failed: [Cannot revert to Snapshot. VM's Snapshot does not exist.]

This is possible only via REST (due to a missing validation), as in the UI all operation except for undo/commit are disabled when a snapshot is in preview.
So when clone is issued it resets the snapshot's status to OK and messes up everything.

And since as a result of the preview we have two volumes with the same parent (the first snapshot and the volume created for the preview), the chain that's created for cloneImageStructure is incorrect.

To reproduce:
.1 Create a VM with a disk
.2 Create a snapshot
.3 Preview the snapshot
.4 Clone a VM from the snapshot (via REST)
.5 Start the VM and move (though cold move is affected by this as well)

Comment 3 Yosi Ben Shimon 2019-07-04 08:47:32 UTC
Tested using:
ovirt-engine-4.3.5.2-0.1.el7.noarch

Tried both scenarios from the bug description and comment #2. Both work fine.

The bug description scenario went fine without any issues.

About the steps in comment #2:
The response (REST API) I've got from step 4 is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>[Cannot add VM. VM is previewing a Snapshot.]</detail>
    <reason>Operation Failed</reason>
</fault>


Engine log:

2019-07-04 11:29:42,217+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-20) [] Operation Failed: [Cannot add VM. VM is previewing a Snapshot.]

Moving to VERIFIED.

Comment 4 Sandro Bonazzola 2019-07-30 14:08:43 UTC
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.