Bug 1122436

Summary: [TEXT] Storage Pool Manager role is flipping when export storage domain is not accessible.
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED DEFERRED QA Contact: Aharon Canan <acanan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: amureini, ecohen, iheim, laravot, lpeer, pdwyer, rbalakri, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0Flags: scohen: needinfo+
scohen: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-14 15:11:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roman Hodain 2014-07-23 09:12:33 UTC
Description of problem:
	When an export storage domain is not accessible and the list of VMs on
	this domain is refreshed. The SPM roles starts flipping even if the storage
	domain is set as inactive by the engine later.

Version-Release number of selected component (if applicable):
	rhevm-3.4.0-0.22.el6ev.noarch

How reproducible:
	100%

Steps to Reproduce:
	1. Register two hosts to a DC
	2. Create a data domain (iSCSI/FC)
	3. Create an export domain
	4. Stop nfs service on the NFS share
	5. Go to the Storage tab, choose the export domain and sub-tab  WM Import
	6. Click a couple of times on the refresh icon in the right top corner of the
	   sub-tab.

Actual results:
	Many GetVmsInfoVDSCommand commands are generated before the export domain
	is marked as inactive. This leads to failed in GetVmsInfoVDS method which
	makes the SPM role to flip on another hypervisor. 

Expected results:
	The failed command GetVmsInfoVDSCommand on export/ISO domain does not cause 
        the SPM to flip.

Additional info:
	It also would be beneficial not to let the user to trigger more than one
	request at the time.

Comment 2 Allon Mureinik 2014-07-23 11:44:00 UTC
Liron, haven't you solved something similar already?

Comment 3 Liron Aravot 2014-07-24 09:17:04 UTC
Allon, as part of bug 958766 we decided to add the manual refresh button but to leave the behavior as is.

Comment 5 Sean Cohen 2014-07-29 05:47:04 UTC
Indeed, this was addressed already in RHEV 3.3
 
"The automatic refresh is replaced with a manual refresh button, which decreases the number of failover attempts."

Closing as dup of bug 958766 
Sean

*** This bug has been marked as a duplicate of bug 958766 ***

Comment 6 Roman Hodain 2014-07-29 07:12:22 UTC
I would like to discuss this before we close it as duplicated. 

The https://bugzilla.redhat.com/show_bug.cgi?id=958766 was resolved and the resolution was removing the automatic refresh of the ISO/EXPORT domain. This works fine.

The case her is similar but not the same. The issue is in the manual refresh. Keep in mind that the rhev environment can be managed by more then one admin. Many different people can trigger the domain refresh manually. We have already seen this in a customer environment. The problem is the this will block the environment not just for a couple if minutes but it can take much longer.

The refresh should not trigger the SPM flip. It should mark the SD as inactive. In the same way we should not let the customer to trigger the refresh operation more than ones. 

Is there a refresh operation running? then do not trigger a new one and wait for the result of the one already running.

I am not sure how exactly is this implemented and what are the obstacles but from the user point of view it does not behave correctly.

Does this make sense?

Comment 7 Liron Aravot 2014-07-31 13:54:16 UTC
Roman,
The defined behavior is that on export domain failure we'll attempt to start the spm on another host to try and get the export domain data.
The case of multiple admins refreshing at the exact time is kind of rare, the possibility of having a config value to decide on the behavior on that case or caching of the export domain content within ovirt was declined when we handled the first bug for multiple reasons (those scenarios being very rare, no interest on those solutions currently and so one), the decision made was to make the refresh button manual.

Comment 8 Allon Mureinik 2014-07-31 16:45:36 UTC
Sean - your call.

Comment 9 Sean Cohen 2014-08-04 04:53:07 UTC
> (In reply to Allon Mureinik from comment #8)
> Sean - your call.
I agree that thee case of multiple admins refreshing at the exact time is mlre rare and Admins needs to be advised on the usage of the domain refresh and implications.

I suggest improving our documentation of this function with relevant warning, rather than complicating the code for it.

Adding doc text flag

Thanks
Sean