Bug 1430358

Summary: Restarting the SPM vdsm process during a cold merge, after cannot preview other snapshots
Product: [oVirt] vdsm Reporter: Carlos Mestre González <cmestreg>
Component: CoreAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Carlos Mestre González <cmestreg>
Severity: high Docs Contact:
Priority: high    
Version: 4.19.0CC: ahino, alitke, amureini, bugs, cmestreg, stirabos, tnisan, ylavi
Target Milestone: ovirt-4.1.3Flags: rule-engine: ovirt-4.1+
rule-engine: exception+
ylavi: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+
Target Release: 4.19.19   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-06 13:31:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm and engine logs none

Description Carlos Mestre González 2017-03-08 12:52:38 UTC
Description of problem:
During a cold merge, restarting the SPM vdsm process turns unable to preview the snapshots.

Version-Release number of selected component (if applicable):
4.1.1.3-0.1.el7

How reproducible:
100% (twice)

Steps to Reproduce:
on ISCSI:
1. Create a vm (in this case with multiple disks, 1 OS and 3 additional with FS)
2. Create 3 snapshots
3. Remove the middle snapshot
4. Immediately after the operation starts, restart the vdsm process on the SPM (systemctl restart vdsmd)
5. Remove operation Fails.
6. Try to preview the last snapshot

Actual results:
Failed to complete Snapshot-Preview snapshot_18901_iscsi_2 for VM vm_TestCase18901_REST_ISCS_0812575676.
VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume

Expected results:
Even if the remove of snapshot fails we should be able still to preview the snapshots normally.

Comment 1 Carlos Mestre González 2017-03-08 12:56:44 UTC
Created attachment 1261267 [details]
vdsm and engine logs

- Remove snapshot_18901_iscsi_1
- Immediately restart vdsm host_mixed_2 that is the SPM (systemctl restart)
- Operation fails.
- Try to preview snapshot_18901_iscsi_2 ===> FAILS (Now SPM is host_mixed_3)

Comment 2 Liron Aravot 2017-03-23 15:39:06 UTC
The new cold merge flow first changes the base volume to illegal, then the prepare/merge phases are performed and then the volume status is changed back to legal (as part of the finalizeMerge operation)..
In this scenario vdsm was restarted during the merge, causing the volume to remain illegal which fails further operations on the chain (like createVolume).

As it seems, we shouldn't set the volume to ILLEGAL (After verifying that the volume indeed can be safely used even after a failure during any of the operations).


2017-03-08 13:14:05,044+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] In recovery, ignoring 'SDM.merge' in bridge with {'subchain_info': {'img_id': 'f45bba2d-7e2f-4d02-9c30-a71818c1a20a', 'sd_id': '9eed63d8-bf77-442a-b4a0-fd313324d783', 'top_id': '45d68105-965f-4f93-b8f0-485ba6809700', 'base_id': '5f2c7357-c772-4e8e-acf4-c190c0e0d7ed', 'base_generation': 0}, 'job_id': 'd69b099a-1d8b-4897-946e-65cc2dd5667a'} (__init__:527)


2017-03-08 13:16:41,941+0200 ERROR (tasks/4) [storage.VolumeManifest] Unexpected error (volume:580)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 578, in prepare
    chainrw=chainrw, setrw=setrw)
  File "/usr/share/vdsm/storage/volume.py", line 557, in prepare
    raise se.prepareIllegalVolumeError(self.volUUID)
prepareIllegalVolumeError: Cannot prepare illegal volume: ('5f2c7357-c772-4e8e-acf4-c190c0e0d7ed',)

Comment 3 Liron Aravot 2017-03-26 07:03:54 UTC
Or alternatively - we can set it back to LEGAL again.
Ala/Adam - what's your take on that?

Comment 4 Adam Litke 2017-03-27 13:30:15 UTC
Hmm, it might make sense to leave the base volume LEGAL since the chain and data remains valid even during an interrupted merge.  We just need to make sure that the engine marks the snapshot as ILLEGAL (in the DB) so we do not attempt to preview or revert to the partially deleted snapshot.

In order to start leaving the base vol LEGAL we need to audit the entity polling that engine does to determine the status of a cold merge when there is no host job reported.  AFAIK the current design checks for base volume legality.  If we can move that to using volume generation validation instead then we should be covered.

Comment 6 Ala Hino 2017-05-31 19:35:40 UTC
Carlos,

This is a vdsm bug.
Please change the Product to vdsm and provide the version where this bug observed.

Comment 7 rhev-integ 2017-06-09 09:57:59 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.19.18' doesn't contain patch 'https://gerrit.ovirt.org/77610']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.18

For more info please contact: infra

Comment 8 Carlos Mestre González 2017-06-28 15:08:42 UTC
verified on: vdsm-4.19.20-1.el7ev.x86_64  (rhevm-4.1.3)