Bug 739933

Summary: VDSM - Storage: Failed activation of a corrupt non-master domain shouldn't return "Cannot find master domain"
Product: [Retired] oVirt Reporter: Daniel Paikov <dpaikov>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: abaron, amureini, bazulay, danken, fsimonce, hateya, iheim, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: Unspecified   
OS: Linux   
Whiteboard: storage
Fixed In Version: vdsm-4.9.6-2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-17 07:49:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm.log none

Description Daniel Paikov 2011-09-20 13:33:35 UTC
Created attachment 524033 [details]
vdsm.log

* Corrupt non-master domain by echo 1 > metadata.
* Try to activate the domain.
* VDSM returns error 304 (Cannot find master domain) even though the pool has an active master.

Comment 2 Dan Kenigsberg 2011-09-20 14:01:03 UTC
I suspect this flow has changed due to bug 737329. would you reproduce this with vdsm-4.9-104 ?

Comment 3 Dan Kenigsberg 2011-09-20 14:02:46 UTC
Daniel, please also describe the effects of this on RHEV-M; I would like to consider if this is a 3.0 blocker.

Comment 4 Daniel Paikov 2011-09-20 14:11:49 UTC
I opened it on 4.9-96 by mistake. I'm seeing this problem on 4.9-101, which is post bug #737329.

What I'm seeing in RHEVM is a pop-up dialog that with the exact same text as shown in vdsm.log - "Cannot find master domain / Error 304".

Comment 7 Daniel Paikov 2011-10-09 11:24:55 UTC
It's a more general bug than we thought, so we should consider fixing it as early as possible:

* Domain doesn't have to be data - can be export, ISO, etc.
* Domain doesn't have to be corrupt - can be simply unreachable.
* Effect on RHEVM - forces SPM re-election even though there's nothing wrong, because the error code indicates a master problem that doesn't really exist.

Comment 10 Eduardo Warszawski 2011-10-10 11:41:32 UTC
Minimal fix:
http://gerrit.usersys.redhat.com/#change,1014

Comment 12 Federico Simoncelli 2012-06-22 11:03:18 UTC
In the ovirt-3.1 branch as: ed335f1884e56b57c26c96db3d49af06bd665f06

Comment 13 Daniel Paikov 2012-07-04 13:30:29 UTC
Checked on upstream.