Created attachment 1419911 [details] engine and vdsm logs Description of problem: After a previously failed LSM due to blocking the source domain(nfs). Unblock the source domain and try LSM again, fails with: 2018-04-10 16:17:36,233+0300 ERROR (jsonrpc/4) [virt.vm] (vmId='87a3516c-a6ce-4afc-abb8-8bc2eda67e02') The base volume doesn't exist: {'device': 'disk', 'domainID': 'f3d0fa77-a26f-4e99-99db-2d0a1770acbe', 'volumeID': 'de39eb44-28c8-4a0d-ace1-27e95ddd5611', 'imageID': '3f032b55-2d3a-4a64-b061-66c525093be2'} (vm:4404) 2018-04-10 16:17:36,234+0300 INFO (jsonrpc/4) [api.virt] FINISH snapshot return={'status': {'message': 'Snapshot failed', 'code': 48}} from=::ffff:10.35.162.9,55622, flow_id=30e59408-257b-4f6e-9314-c8cb2c67af76, vmId=87a3516c-a6ce-4afc-abb8-8bc2eda67e02 (api:52) Version-Release number of selected component (if applicable): ovirt-engine-4.2.2.6-0.1.el7.noarch vdsm-4.20.23-1.el7ev.x86_64 How reproducible: Happened 2/2 times with nfs(source domain). Did not happen with iscsi source domain Steps to Reproduce: 1. Create VM with disks, OS and write data 2. LSM nfs disk to iscsi domain -> block(iptables) of source(nfs) domain during host Non Operational - LSM - fails 3. Unblock(iptables) of source(nfs) domain - host operational again 4. Run same LSM again - fails with snapshot create failse with base volume not found Actual results: LSM - fails with snapshot create fail due to base volume not found Expected results: LSM should work Additional info: VDSM log - HSM - was SPM before step 2 - storage-ge10-vdsm2 ------------------------------------------- 2018-04-10 16:17:36,231+0300 INFO (jsonrpc/4) [api.virt] START snapshot(snapDrives=[{'baseVolumeID': 'de39eb44-28c8-4a0d-ace1-27e95ddd5611', 'domainID': 'f3d0fa77-a26f-4e99-99db-2d0a1770acbe', 'volumeID': '2f44 ca2a-e5f2-4b7c-8a00-b79a0ee6a831', 'imageID': '3f032b55-2d3a-4a64-b061-66c525093be2'}], snapMemory=None, frozen=True) from=::ffff:10.35.162.9,55622, flow_id=30e59408-257b-4f6e-9314-c8cb2c67af76, vmId=87a3516c-a6 ce-4afc-abb8-8bc2eda67e02 (api:46) 2018-04-10 16:17:36,233+0300 ERROR (jsonrpc/4) [virt.vm] (vmId='87a3516c-a6ce-4afc-abb8-8bc2eda67e02') The base volume doesn't exist: {'device': 'disk', 'domainID': 'f3d0fa77-a26f-4e99-99db-2d0a1770acbe', 'volumeID': 'de39eb44-28c8-4a0d-ace1-27e95ddd5611', 'imageID': '3f032b55-2d3a-4a64-b061-66c525093be2'} (vm:4404) 2018-04-10 16:17:36,234+0300 INFO (jsonrpc/4) [api.virt] FINISH snapshot return={'status': {'message': 'Snapshot failed', 'code': 48}} from=::ffff:10.35.162.9,55622, flow_id=30e59408-257b-4f6e-9314-c8cb2c67af76, vmId=87a3516c-a6ce-4afc-abb8-8bc2eda67e02 (api:52) 2018-04-10 16:17:36,235+0300 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error 48) in 0.00 seconds (__init__:573) 2018-04-10 16:17:36,359+0300 DEBUG (check/loop) [storage.check] START check u'/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge10__volume__1/a3f26fbd-2a9b-427b-897a-83289b232ac0/dom_md/metadata' (delay=0.00) (check:289) 2018-04-10 16:17:36,426+0300 DEBUG (jsonrpc/1) [storage.TaskManager.Task] (Task='1239be77-5ab9-4472-95d2-33367be75fb6') moving from state init -> state preparing (task:602) 2018-04-10 16:17:36,429+0300 INFO (jsonrpc/1) [vdsm.api] START getQemuImageInfo(sdUUID='f3d0fa77-a26f-4e99-99db-2d0a1770acbe', spUUID='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', imgUUID='3f032b55-2d3a-4a64-b061-66c525093be2', volUUID='2f44ca2a-e5f2-4b7c-8a00-b79a0ee6a831', options=None) from=::ffff:10.35.162.9,55622, flow_id=30e59408-257b-4f6e-9314-c8cb2c67af76, task_id=1239be77-5ab9-4472-95d2-33367be75fb6 (api:46) 2018-04-10 16:17:36,431+0300 DEBUG (jsonrpc/1) [storage.ResourceManager] Trying to register resource '00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe' for lock type 'shared' (resourceManager:495) 2018-04-10 16:17:36,432+0300 DEBUG (jsonrpc/1) [storage.ResourceManager] Resource '00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe' is free. Now locking as 'shared' (1 active user) (resourceManager:552) 2018-04-10 16:17:36,433+0300 DEBUG (jsonrpc/1) [storage.ResourceManager.Request] (ResName='00_storage.f3d0fa77-a26f-4e99-99db-2d0a1770acbe', ReqID='0a00410f-d34e-4103-84a3-2daab3c8596b') Granted request (resourceManager:222) VDSM New SPM after step 2 - storage-ge10-vdsm1: ------------------------------------------- 2018-04-10 16:17:39,623+0300 DEBUG (check/loop) [storage.asyncevent] Process <subprocess32.Popen object at 0x7f0c7434eed0> terminated (count=1) (asyncevent:659) 2018-04-10 16:17:39,625+0300 DEBUG (check/loop) [storage.check] FINISH check '/dev/9df48d7d-f722-4b82-a9f0-1cbb477dbfbf/metadata' (rc=0, elapsed=0.08) (check:359) 2018-04-10 16:17:40,002+0300 DEBUG (tasks/7) [storage.Misc.excCmd] FAILED: <err> = ' Logical Volume "b6198269-3e83-40d8-83e8-81fa6a6b37bc" already exists in volume group "57635752-238d-4948-81f0-e4f010845d16"\n '; <rc> = 5 (commands:86) 2018-04-10 16:17:40,004+0300 ERROR (tasks/7) [storage.Volume] Failed to create volume /rhev/data-center/mnt/blockSD/57635752-238d-4948-81f0-e4f010845d16/images/3f032b55-2d3a-4a64-b061-66c525093be2/b6198269-3e83- 40d8-83e8-81fa6a6b37bc: Cannot create Logical Volume: 'vgname=57635752-238d-4948-81f0-e4f010845d16 lvname=b6198269-3e83-40d8-83e8-81fa6a6b37bc err=[\' Logical Volume "b6198269-3e83-40d8-83e8-81fa6a6b37bc" alrea dy exists in volume group "57635752-238d-4948-81f0-e4f010845d16"\']' (volume:1209) 2018-04-10 16:17:40,005+0300 ERROR (tasks/7) [storage.Volume] Unexpected error (volume:1246) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1153, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) CannotCreateLogicalVolume: Cannot create Logical Volume: 'vgname=57635752-238d-4948-81f0-e4f010845d16 lvname=b6198269-3e83-40d8-83e8-81fa6a6b37bc err=[\' Logical Volume "b6198269-3e83-40d8-83e8-81fa6a6b37bc" al ready exists in volume group "57635752-238d-4948-81f0-e4f010845d16"\']' 2018-04-10 16:17:40,015+0300 ERROR (tasks/7) [storage.Image] Unexpected error (image:420) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 403, in _createTargetImage srcVolUUID=volParams['parent']) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 801, in createVolume initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 1206, in create initialSize=initialSize) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py", line 505, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1153, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) Engine log: --------------------------- 2018-04-10 16:17:35,122+03 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [30e59408-257b-4f6e-9314-c8cb2c67af76] Command 'LiveMigrateDi sk' (id: '7e17d9c0-d5b8-49f3-8435-ea32bc51d8b3') waiting on child command id: 'd9d50bf8-f4c9-4757-8bd3-d584c44f4300' type:'CreateAllSnapshotsFromVm' to complete 2018-04-10 16:17:36,189+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Ending command 'o rg.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' successfully. 2018-04-10 16:17:36,225+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] START, SnapshotVDSComman d(HostName = host_mixed_2, SnapshotVDSCommandParameters:{hostId='fea2403a-5581-4920-80db-a015412f72ef', vmId='87a3516c-a6ce-4afc-abb8-8bc2eda67e02'}), log id: 31ecf495 2018-04-10 16:17:36,238+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Failed in 'SnapshotVDS' method 2018-04-10 16:17:36,247+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command SnapshotVDS failed: Snapshot failed 2018-04-10 16:17:36,247+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand' return value 'StatusOnlyReturn [status=Status [code=48, message=Snapshot failed]]' 2018-04-10 16:17:36,247+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] HostName = host_mixed_2 2018-04-10 16:17:36,247+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Command 'SnapshotVDSCommand(HostName = host_mixed_2, SnapshotVDSCommandParameters:{hostId='fea2403a-5581-4920-80db-a015412f72ef', vmId='87a3516c-a6ce-4afc-abb8-8bc2eda67e02'})' execution failed: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 2018-04-10 16:17:36,247+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] FINISH, SnapshotVDSCommand, log id: 31ecf495 2018-04-10 16:17:36,248+03 WARN [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Could not perform live snapshot due to error, VM will still be configured to the new created snapshot: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, code = 48 (Failed with error SNAPSHOT_FAILED and code 48) 2018-04-10 16:17:36,265+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] EVENT_ID: USER_CREATE_LIVE_SNAPSHOT_FINISHED_FAILURE(170), Failed to create live snapshot 'vm10042018_Disk2 Auto-generated for Live Storage Migration' for VM 'vm10042018'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. 2018-04-10 16:17:36,281+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand' successfully. 2018-04-10 16:17:36,288+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandParameters:{storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', ignoreFailoverLimit='false', storageDomainId='f3d0fa77-a26f-4e99-99db-2d0a1770acbe', imageGroupId='3f032b55-2d3a-4a64-b061-66c525093be2', imageId='2f44ca2a-e5f2-4b7c-8a00-b79a0ee6a831'}), log id: 2467dd25 2018-04-10 16:17:36,294+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] START, GetVolumeInfoVDSCommand(HostName = host_mixed_1, GetVolumeInfoVDSCommandParameters:{hostId='f6e3b61d-a07a-40f1-820d-d7fd9c67f6f7', storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', storageDomainId='f3d0fa77-a26f-4e99-99db-2d0a1770acbe', imageGroupId='3f032b55-2d3a-4a64-b061-66c525093be2', imageId='2f44ca2a-e5f2-4b7c-8a00-b79a0ee6a831'}), log id: 360439e7 2018-04-10 16:17:36,399+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@c1adbc9c, log id: 360439e7 2018-04-10 16:17:36,399+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@c1adbc9c, log id: 2467dd25 2018-04-10 16:17:36,416+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetQemuImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] START, GetQemuImageInfoVDSCommand(HostName = host_mixed_2, GetVolumeInfoVDSCommandParameters:{hostId='fea2403a-5581-4920-80db-a015412f72ef', storagePoolId='a9e2b4b2-6c07-4af8-8fb0-5675878e2376', storageDomainId='f3d0fa77-a26f-4e99-99db-2d0a1770acbe', imageGroupId='3f032b55-2d3a-4a64-b061-66c525093be2', imageId='2f44ca2a-e5f2-4b7c-8a00-b79a0ee6a831'}), log id: 1b117437 2018-04-10 16:17:36,957+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetQemuImageInfoVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [30e59408-257b-4f6e-9314-c8cb2c67af76] FINISH, GetQemuImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.QemuImageInfo@5ee8a180, log id: 1b117437 2018-04-10 16:17:36,979+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] Lock freed to object 'EngineLock:{exclusiveLocks='[87a3516c-a6ce-4afc-abb8-8bc2eda67e02=LIVE_STORAGE_MIGRATION]', sharedLocks=''}' 2018-04-10 16:17:36,990+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [] EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Failed to complete snapshot 'vm10042018_Disk2 Auto-generated for Live Storage Migration' creation for VM 'vm10042018'.
Look very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1559929
Some notes: In the first attempt, after blocking access to the storage, the VM went into an unresponsive state and snapshot creation was not complete and was not reported as such (there are no errors in vdsm). So there are two issues here: 1. Why was the snapshot creation considered successful since the live snapshot part did not take place 2. LSM should not have continued after an unsuccessful AG snapshot creation - this issue is currently being fixed [1] So I believe the second failure is due to the snapshot not being created properly in the first attempt. The "Cannot create Logical Volume: 'vgname=57635752-238d-4948-81f0-e4f010845d16 lvname=b6198269-3e83-40d8-83e8-81fa6a6b37bc err=[\' Logical Volume "b6198269-3e83-40d8-83e8-81fa6a6b37bc" already exists in volume group "57635752-238d-4948-81f0-e4f010845d16"\']'" Is due the first attempt reaching the CreateImagePlaceHolder stage (again, despite the snapshot not really being created): 2018-04-10 16:08:30,016+0300 INFO (tasks/7) [storage.LVM] Creating LV (vg=57635752-238d-4948-81f0-e4f010845d16, lv=b6198269-3e83-40d8-83e8-81fa6a6b37bc, size=5120m, activate=True, contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',)) (lvm:1139) [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1497355
How do we proceed then Benny?
(In reply to Benny Zlotnik from comment #2) > Some notes: > In the first attempt, after blocking access to the storage, the VM went into > an unresponsive state and snapshot creation was not complete and was not > reported as such (there are no errors in vdsm). > > So there are two issues here: > 1. Why was the snapshot creation considered successful since the live > snapshot part did not take place > 2. LSM should not have continued after an unsuccessful AG snapshot creation > - this issue is currently being fixed [1] > > So I believe the second failure is due to the snapshot not being created > properly in the first attempt. > > The "Cannot create Logical Volume: > 'vgname=57635752-238d-4948-81f0-e4f010845d16 > lvname=b6198269-3e83-40d8-83e8-81fa6a6b37bc err=[\' Logical Volume > "b6198269-3e83-40d8-83e8-81fa6a6b37bc" already exists in volume group > "57635752-238d-4948-81f0-e4f010845d16"\']'" > > Is due the first attempt reaching the CreateImagePlaceHolder stage (again, > despite the snapshot not really being created): > 2018-04-10 16:08:30,016+0300 INFO (tasks/7) [storage.LVM] Creating LV > (vg=57635752-238d-4948-81f0-e4f010845d16, > lv=b6198269-3e83-40d8-83e8-81fa6a6b37bc, size=5120m, activate=True, > contiguous=False, initialTags=('OVIRT_VOL_INITIALIZING',)) (lvm:1139) > > > > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1497355 Kevin, can we retest this? 2 has been fixed, and I vaguely remember discussing checking if 1 is something consistent
Hi Benny, if you think this bug was fixed, please move it to ON_QA and we'll test. Thanks
Trying to verify this bug I encountered Bug 1650985 . As the scenario of this bug and Bug 1650985 are the same I marked this bug verification as depending on Bug 1650985 resolution.
As Bug 1650985 is targeted to 4.3.4 this can not be verified until then.
Is this bug fixed in ovirt-engine-4.3.5? Can this move to QE?
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: infra
Marking as blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1695585
This bug is blocked by bug 1695585 which is in new state and not targeted. Please retartget this bug out of 4.3.5.
(In reply to Avihai from comment #13) > This bug is blocked by bug 1695585 which is in new state and not targeted. > > Please retartget this bug out of 4.3.5. clearing target milestone to be triaged again
Verified on: engine-4.4.0-0.13 vdsm-4.40.0-164
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.