Bug 1430358 - Restarting the SPM vdsm process during a cold merge, after cannot preview other snapshots
Summary: Restarting the SPM vdsm process during a cold merge, after cannot preview oth...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.19.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.1.3
: 4.19.19
Assignee: Ala Hino
QA Contact: Carlos Mestre González
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-08 12:52 UTC by Carlos Mestre González
Modified: 2017-07-06 13:31 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-06 13:31:41 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: exception+
ylavi: planning_ack+
amureini: devel_ack+
ratamir: testing_ack+


Attachments (Terms of Use)
vdsm and engine logs (4.05 MB, application/x-gzip)
2017-03-08 12:56 UTC, Carlos Mestre González
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 76833 0 master MERGED cold-merge: Minimize illegal chain during cold merge 2020-10-19 17:20:24 UTC
oVirt gerrit 76869 0 master MERGED cold-merge: Refactor metadata update code 2020-10-19 17:20:24 UTC
oVirt gerrit 77610 0 ovirt-4.1 MERGED cold-merge: Refactor metadata update code 2020-10-19 17:20:24 UTC
oVirt gerrit 77611 0 ovirt-4.1 MERGED cold-merge: Minimize illegal chain during cold merge 2020-10-19 17:20:24 UTC

Description Carlos Mestre González 2017-03-08 12:52:38 UTC
Description of problem:
During a cold merge, restarting the SPM vdsm process turns unable to preview the snapshots.

Version-Release number of selected component (if applicable):
4.1.1.3-0.1.el7

How reproducible:
100% (twice)

Steps to Reproduce:
on ISCSI:
1. Create a vm (in this case with multiple disks, 1 OS and 3 additional with FS)
2. Create 3 snapshots
3. Remove the middle snapshot
4. Immediately after the operation starts, restart the vdsm process on the SPM (systemctl restart vdsmd)
5. Remove operation Fails.
6. Try to preview the last snapshot

Actual results:
Failed to complete Snapshot-Preview snapshot_18901_iscsi_2 for VM vm_TestCase18901_REST_ISCS_0812575676.
VDSM host_mixed_3 command HSMGetAllTasksStatusesVDS failed: Error creating a new volume

Expected results:
Even if the remove of snapshot fails we should be able still to preview the snapshots normally.

Comment 1 Carlos Mestre González 2017-03-08 12:56:44 UTC
Created attachment 1261267 [details]
vdsm and engine logs

- Remove snapshot_18901_iscsi_1
- Immediately restart vdsm host_mixed_2 that is the SPM (systemctl restart)
- Operation fails.
- Try to preview snapshot_18901_iscsi_2 ===> FAILS (Now SPM is host_mixed_3)

Comment 2 Liron Aravot 2017-03-23 15:39:06 UTC
The new cold merge flow first changes the base volume to illegal, then the prepare/merge phases are performed and then the volume status is changed back to legal (as part of the finalizeMerge operation)..
In this scenario vdsm was restarted during the merge, causing the volume to remain illegal which fails further operations on the chain (like createVolume).

As it seems, we shouldn't set the volume to ILLEGAL (After verifying that the volume indeed can be safely used even after a failure during any of the operations).


2017-03-08 13:14:05,044+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] In recovery, ignoring 'SDM.merge' in bridge with {'subchain_info': {'img_id': 'f45bba2d-7e2f-4d02-9c30-a71818c1a20a', 'sd_id': '9eed63d8-bf77-442a-b4a0-fd313324d783', 'top_id': '45d68105-965f-4f93-b8f0-485ba6809700', 'base_id': '5f2c7357-c772-4e8e-acf4-c190c0e0d7ed', 'base_generation': 0}, 'job_id': 'd69b099a-1d8b-4897-946e-65cc2dd5667a'} (__init__:527)


2017-03-08 13:16:41,941+0200 ERROR (tasks/4) [storage.VolumeManifest] Unexpected error (volume:580)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 578, in prepare
    chainrw=chainrw, setrw=setrw)
  File "/usr/share/vdsm/storage/volume.py", line 557, in prepare
    raise se.prepareIllegalVolumeError(self.volUUID)
prepareIllegalVolumeError: Cannot prepare illegal volume: ('5f2c7357-c772-4e8e-acf4-c190c0e0d7ed',)

Comment 3 Liron Aravot 2017-03-26 07:03:54 UTC
Or alternatively - we can set it back to LEGAL again.
Ala/Adam - what's your take on that?

Comment 4 Adam Litke 2017-03-27 13:30:15 UTC
Hmm, it might make sense to leave the base volume LEGAL since the chain and data remains valid even during an interrupted merge.  We just need to make sure that the engine marks the snapshot as ILLEGAL (in the DB) so we do not attempt to preview or revert to the partially deleted snapshot.

In order to start leaving the base vol LEGAL we need to audit the entity polling that engine does to determine the status of a cold merge when there is no host job reported.  AFAIK the current design checks for base volume legality.  If we can move that to using volume generation validation instead then we should be covered.

Comment 6 Ala Hino 2017-05-31 19:35:40 UTC
Carlos,

This is a vdsm bug.
Please change the Product to vdsm and provide the version where this bug observed.

Comment 7 rhev-integ 2017-06-09 09:57:59 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.19.18' doesn't contain patch 'https://gerrit.ovirt.org/77610']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.18

For more info please contact: infra

Comment 8 Carlos Mestre González 2017-06-28 15:08:42 UTC
verified on: vdsm-4.19.20-1.el7ev.x86_64  (rhevm-4.1.3)


Note You need to log in before you can comment on or make changes to this bug.