Bug 1664338
| Summary: | Live storage migration fails during diskReplicateStart with VolumeDoesNotExist for the new snapshot created on the target domain | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Elad <ebenahar> | ||||||
| Component: | BLL.Storage | Assignee: | Benny Zlotnik <bzlotnik> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Yosi Ben Shimon <ybenshim> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.3.0 | CC: | aefrat, bugs, tnisan | ||||||
| Target Milestone: | ovirt-4.3.5 | Keywords: | Automation | ||||||
| Target Release: | 4.3.5 | Flags: | rule-engine:
ovirt-4.3+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | ovirt-engine-4.3.5 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2019-07-30 14:08:43 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1519883 [details]
logs2
Encountered again:
2019-01-10 19:10:03,900+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='e471cae3-69fc-47f0-8a48-d44155f192a8') Unexpected error (task:875)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
return fn(*args, **kargs)
File "<string>", line 2, in prepareImage
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3176, in prepareImage
raise se.VolumeDoesNotExist(leafUUID)
VolumeDoesNotExist: Volume does not exist: (u'f08c3a6e-18e7-4b11-bd2f-231e364f694a',)
2019-01-10 19:10:03,900+0200 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='e471cae3-69fc-47f0-8a48-d44155f192a8') Task._run: e471cae3-69fc-47f0-8a48-d44155f192a8 (u'4a48ac50-8707-4e22-ae74-35ea0fddd3b2', u'b3f3778b-0a12-49fe-97e6-06bb5072f104', u'94a43837-bdab-4eaa-8de1-869f54051837', u'f08c3a6e-18e7-4b11-bd2f-231e364f694a') {} failed - stopping task (task:894)
So the issue here is that a snapshot that is currently in preview is being cloned to create a VM:
2019-01-10 18:51:32,042+02 INFO [org.ovirt.engine.core.bll.AddVmFromSnapshotCommand] (default task-143) [vms_create_866e5f95-5892-4f7e] Lock Acquired to object 'EngineLock:{exclusiveLocks='[vm_random_test_1018512639=VM_NAME, 076d098a-7e74-423b-841a-1b17e920f0b4=VM]', sharedLocks=''}'
This is why the commit/undo fails with this error:
2019-01-10 19:04:37,744+02 WARN [org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand] (default task-143) [vms_syncAction_97379440-645c-4e55] Validation of action 'RestoreAllSnapshots' failed for user admin@internal-authz. Reasons: VAR__ACTION__REVERT_TO,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_VM_SNAPSHOT_DOES_NOT_EXIST
2019-01-10 19:04:37,745+02 INFO [org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand] (default task-143) [vms_syncAction_97379440-645c-4e55] Lock freed to object 'EngineLock:{exclusiveLocks='[076d098a-7e74-423b-841a-1b17e920f0b4=VM]', sharedLocks=''}'
2019-01-10 19:04:37,757+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-143) [] Operation Failed: [Cannot revert to Snapshot. VM's Snapshot does not exist.]
This is possible only via REST (due to a missing validation), as in the UI all operation except for undo/commit are disabled when a snapshot is in preview.
So when clone is issued it resets the snapshot's status to OK and messes up everything.
And since as a result of the preview we have two volumes with the same parent (the first snapshot and the volume created for the preview), the chain that's created for cloneImageStructure is incorrect.
To reproduce:
.1 Create a VM with a disk
.2 Create a snapshot
.3 Preview the snapshot
.4 Clone a VM from the snapshot (via REST)
.5 Start the VM and move (though cold move is affected by this as well)
Tested using: ovirt-engine-4.3.5.2-0.1.el7.noarch Tried both scenarios from the bug description and comment #2. Both work fine. The bug description scenario went fine without any issues. About the steps in comment #2: The response (REST API) I've got from step 4 is: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <fault> <detail>[Cannot add VM. VM is previewing a Snapshot.]</detail> <reason>Operation Failed</reason> </fault> Engine log: 2019-07-04 11:29:42,217+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-20) [] Operation Failed: [Cannot add VM. VM is previewing a Snapshot.] Moving to VERIFIED. This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1519196 [details] engine and vdsm logs Description of problem: During live storage migration's diskReplicateStart, got an error for prepareImage that the volume doesn't exist. Disk move failed. Version-Release number of selected component (if applicable): vdsm-4.30.4-1.el7ev.x86_64 libvirt-4.5.0-10.el7_6.3.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64 sanlock-3.6.0-1.el7.x86_64 kernel - 3.10.0-957.3.1.el7.x86_64 RHEL7.6 How reproducible: Happened once Steps to Reproduce: Happened after the following flow: 1. Create a VM (blank) with a thin cow disk attached on NFS 2. Resize the disk 3. Create a snapshot for the VM with the disk 4. Resize the disk 5. Preview the snapshot 6. Commit the snapshot 7. Start the VM 8. Live migrate the VM's disk Actual results: Move disk starts: 2019-01-08 12:39:44,222+02 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: MoveDiskCommand internal: false. Entities affected : ID: 613de8d9-d366-4acd-a75a-22a2c617196d Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER, ID: fdd43a70-b9cc-4029-bb4b-4707590b70f7 Type: StorageAction group CREATE_DISK with role type USER CreateVolume for the place holder on the target domain (as part of create snapshot): 2019-01-08 12:39:45,084+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotForVmCommand internal: tr ue. Entities affected : ID: e1b826d4-c725-47bd-97e0-e89c1c3b2590 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2019-01-08 12:39:45,145+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotDiskCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotDiskCommand internal: true . Entities affected : ID: e1b826d4-c725-47bd-97e0-e89c1c3b2590 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2019-01-08 12:39:45,239+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] Running command: CreateSnapshotCommand internal: true. Entiti es affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2019-01-08 12:39:45,313+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-243) [disks_syncAction_b8ec2474-5ea8-49c9] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='2288b8b4-06e6-44cc-8294-3f6ceec565f5', ignoreFailoverLimit='false', storageDomainId='18889b9b-dcab-4542-bdb4-1d878f40657a', imageGroupId='613de8d9-d366-4acd-a75a-22a2c617196d', imageSizeInBytes='11811160064', volumeFormat='COW', newImageId='fdadd8ee-27ff-47e5-a89c-163727d2d934', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='7d66d729-0f3a-41ad-bc63-74792a816c2d', sourceImageGroupId='613de8d9-d366-4acd-a75a-22a2c617196d'}), log id: 30af3cb7 And fails on vdsm during diskReplicateStart: 2019-01-08 12:40:14,102+0200 INFO (jsonrpc/7) [api.virt] START diskReplicateStart(srcDisk={u'device': u'disk', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2 d934', u'domainID': u'18889b9b-dcab-4542-bdb4-1d878f40657a', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d'}, dstDisk={u'device': u'disk', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'volumeID': u' fdadd8ee-27ff-47e5-a89c-163727d2d934', u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d'}) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec24 74-5ea8-49c9, vmId=e1b826d4-c725-47bd-97e0-e89c1c3b2590 (api:48) 2019-01-08 12:40:14,141+0200 INFO (jsonrpc/7) [vdsm.api] START prepareImage(sdUUID=u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', spUUID=u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', imgUUID=u'613de8d9-d366-4acd-a75a-22a2 c617196d', leafUUID=u'fdadd8ee-27ff-47e5-a89c-163727d2d934', allowIllegal=False) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec2474-5ea8-49c9, task_id=3319baf3-1048-4c88-b5da-a6c783ffcaa1 (api:48 ) 2019-01-08 12:40:14,157+0200 INFO (jsonrpc/7) [vdsm.api] FINISH prepareImage error=Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',) from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8e c2474-5ea8-49c9, task_id=3319baf3-1048-4c88-b5da-a6c783ffcaa1 (api:52) 2019-01-08 12:40:14,158+0200 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='3319baf3-1048-4c88-b5da-a6c783ffcaa1') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in prepareImage File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3176, in prepareImage raise se.VolumeDoesNotExist(leafUUID) VolumeDoesNotExist: Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',) 2019-01-08 12:40:14,160+0200 INFO (jsonrpc/7) [storage.TaskManager.Task] (Task='3319baf3-1048-4c88-b5da-a6c783ffcaa1') aborting: Task is aborted: "Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',)" - code 201 (task:1181) 2019-01-08 12:40:14,162+0200 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH prepareImage error=Volume does not exist: (u'fdadd8ee-27ff-47e5-a89c-163727d2d934',) (dispatcher:81) 2019-01-08 12:40:14,163+0200 ERROR (jsonrpc/7) [virt.vm] (vmId='e1b826d4-c725-47bd-97e0-e89c1c3b2590') Unable to start replication for vda to {u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', 'format': 'cow', 'cache': 'none', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2d934', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'device': 'disk', 'propagateErrors': 'off'} (vm:4558) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4546, in diskReplicateStart replica['path'] = self.cif.prepareVolumePath(replica) File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 415, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {u'domainID': u'fdd43a70-b9cc-4029-bb4b-4707590b70f7', 'format': 'cow', 'cache': 'none', u'volumeID': u'fdadd8ee-27ff-47e5-a89c-163727d2d934', u'imageID': u'613de8d9-d366-4acd-a75a-22a2c617196d', u'poolID': u'2288b8b4-06e6-44cc-8294-3f6ceec565f5', u'device': 'disk', 'propagateErrors': 'off'} 2019-01-08 12:40:14,186+0200 INFO (jsonrpc/7) [api.virt] FINISH diskReplicateStart return={'status': {'message': 'Drive replication error', 'code': 55}} from=::ffff:10.35.161.127,46974, flow_id=disks_syncAction_b8ec2474-5ea8-49c9, vmId=e1b826d4-c725-47bd-97e0-e89c1c3b2590 (api:54) Expected results: LSM should succeed Additional info: The new volume exists after the failure: [root@storage-ge6-vdsm1 ~]# ll /rhev/data-center/2288b8b4-06e6-44cc-8294-3f6ceec565f5/18889b9b-dcab-4542-bdb4-1d878f40657a/images/613de8d9-d366-4acd-a75a-22a2c617196d/ total 4904 -rw-rw----. 1 vdsm kvm 262352 Jan 8 12:15 627003b6-cfe3-4a47-9c73-37d9c37b61d2 -rw-rw----. 1 vdsm kvm 1048576 Jan 8 12:13 627003b6-cfe3-4a47-9c73-37d9c37b61d2.lease -rw-r--r--. 1 vdsm kvm 268 Jan 8 12:15 627003b6-cfe3-4a47-9c73-37d9c37b61d2.meta -rw-rw----. 1 vdsm kvm 196784 Jan 8 12:24 7d66d729-0f3a-41ad-bc63-74792a816c2d -rw-rw----. 1 vdsm kvm 1048576 Jan 8 12:24 7d66d729-0f3a-41ad-bc63-74792a816c2d.lease -rw-r--r--. 1 vdsm kvm 272 Jan 8 12:39 7d66d729-0f3a-41ad-bc63-74792a816c2d.meta -rw-rw----. 1 vdsm kvm 262320 Jan 8 12:08 9a5dd37d-9428-4733-bdf0-deab1af3ffd7 -rw-rw----. 1 vdsm kvm 1048576 Jan 8 11:52 9a5dd37d-9428-4733-bdf0-deab1af3ffd7.lease -rw-r--r--. 1 vdsm kvm 336 Jan 8 12:13 9a5dd37d-9428-4733-bdf0-deab1af3ffd7.meta -rw-rw----. 1 vdsm kvm 196784 Jan 8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934 -rw-rw----. 1 vdsm kvm 1048576 Jan 8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934.lease -rw-r--r--. 1 vdsm kvm 268 Jan 8 12:39 fdadd8ee-27ff-47e5-a89c-163727d2d934.meta