Description of problem: During a cold merge, restarting the SPM vdsm process turns unable to preview the snapshots. Version-Release number of selected component (if applicable): 4.1.1.3-0.1.el7 How reproducible: 100% (twice) Steps to Reproduce: on ISCSI: 1. Create a vm (in this case with multiple disks, 1 OS and 3 additional with FS) 2. Create 3 snapshots 3. Remove the middle snapshot 4. Immediately after the operation starts, restart the vdsm process on the SPM (systemctl restart vdsmd) 5. Remove operation Fails. 6. Try to preview the last snapshot Actual results: Failed to complete Snapshot-Preview snapshot_18901_iscsi_2 for VM vm_TestCase18901_REST_ISCS_0812575676. VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume Expected results: Even if the remove of snapshot fails we should be able still to preview the snapshots normally.
Created attachment 1261267 [details] vdsm and engine logs - Remove snapshot_18901_iscsi_1 - Immediately restart vdsm host_mixed_2 that is the SPM (systemctl restart) - Operation fails. - Try to preview snapshot_18901_iscsi_2 ===> FAILS (Now SPM is host_mixed_3)
The new cold merge flow first changes the base volume to illegal, then the prepare/merge phases are performed and then the volume status is changed back to legal (as part of the finalizeMerge operation).. In this scenario vdsm was restarted during the merge, causing the volume to remain illegal which fails further operations on the chain (like createVolume). As it seems, we shouldn't set the volume to ILLEGAL (After verifying that the volume indeed can be safely used even after a failure during any of the operations). 2017-03-08 13:14:05,044+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] In recovery, ignoring 'SDM.merge' in bridge with {'subchain_info': {'img_id': 'f45bba2d-7e2f-4d02-9c30-a71818c1a20a', 'sd_id': '9eed63d8-bf77-442a-b4a0-fd313324d783', 'top_id': '45d68105-965f-4f93-b8f0-485ba6809700', 'base_id': '5f2c7357-c772-4e8e-acf4-c190c0e0d7ed', 'base_generation': 0}, 'job_id': 'd69b099a-1d8b-4897-946e-65cc2dd5667a'} (__init__:527) 2017-03-08 13:16:41,941+0200 ERROR (tasks/4) [storage.VolumeManifest] Unexpected error (volume:580) Traceback (most recent call last): File "/usr/share/vdsm/storage/volume.py", line 578, in prepare chainrw=chainrw, setrw=setrw) File "/usr/share/vdsm/storage/volume.py", line 557, in prepare raise se.prepareIllegalVolumeError(self.volUUID) prepareIllegalVolumeError: Cannot prepare illegal volume: ('5f2c7357-c772-4e8e-acf4-c190c0e0d7ed',)
Or alternatively - we can set it back to LEGAL again. Ala/Adam - what's your take on that?
Hmm, it might make sense to leave the base volume LEGAL since the chain and data remains valid even during an interrupted merge. We just need to make sure that the engine marks the snapshot as ILLEGAL (in the DB) so we do not attempt to preview or revert to the partially deleted snapshot. In order to start leaving the base vol LEGAL we need to audit the entity polling that engine does to determine the status of a cold merge when there is no host job reported. AFAIK the current design checks for base volume legality. If we can move that to using volume generation validation instead then we should be covered.
Carlos, This is a vdsm bug. Please change the Product to vdsm and provide the version where this bug observed.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'v4.19.18' doesn't contain patch 'https://gerrit.ovirt.org/77610'] gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.18 For more info please contact: infra
verified on: vdsm-4.19.20-1.el7ev.x86_64 (rhevm-4.1.3)