Description of problem:
When an export storage domain is not accessible and the list of VMs on
this domain is refreshed. The SPM roles starts flipping even if the storage
domain is set as inactive by the engine later.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Register two hosts to a DC
2. Create a data domain (iSCSI/FC)
3. Create an export domain
4. Stop nfs service on the NFS share
5. Go to the Storage tab, choose the export domain and sub-tab WM Import
6. Click a couple of times on the refresh icon in the right top corner of the
Many GetVmsInfoVDSCommand commands are generated before the export domain
is marked as inactive. This leads to failed in GetVmsInfoVDS method which
makes the SPM role to flip on another hypervisor.
The failed command GetVmsInfoVDSCommand on export/ISO domain does not cause
the SPM to flip.
It also would be beneficial not to let the user to trigger more than one
request at the time.
Liron, haven't you solved something similar already?
Allon, as part of bug 958766 we decided to add the manual refresh button but to leave the behavior as is.
Indeed, this was addressed already in RHEV 3.3
"The automatic refresh is replaced with a manual refresh button, which decreases the number of failover attempts."
Closing as dup of bug 958766
*** This bug has been marked as a duplicate of bug 958766 ***
I would like to discuss this before we close it as duplicated.
The https://bugzilla.redhat.com/show_bug.cgi?id=958766 was resolved and the resolution was removing the automatic refresh of the ISO/EXPORT domain. This works fine.
The case her is similar but not the same. The issue is in the manual refresh. Keep in mind that the rhev environment can be managed by more then one admin. Many different people can trigger the domain refresh manually. We have already seen this in a customer environment. The problem is the this will block the environment not just for a couple if minutes but it can take much longer.
The refresh should not trigger the SPM flip. It should mark the SD as inactive. In the same way we should not let the customer to trigger the refresh operation more than ones.
Is there a refresh operation running? then do not trigger a new one and wait for the result of the one already running.
I am not sure how exactly is this implemented and what are the obstacles but from the user point of view it does not behave correctly.
Does this make sense?
The defined behavior is that on export domain failure we'll attempt to start the spm on another host to try and get the export domain data.
The case of multiple admins refreshing at the exact time is kind of rare, the possibility of having a config value to decide on the behavior on that case or caching of the export domain content within ovirt was declined when we handled the first bug for multiple reasons (those scenarios being very rare, no interest on those solutions currently and so one), the decision made was to make the refresh button manual.
Sean - your call.
> (In reply to Allon Mureinik from comment #8)
> Sean - your call.
I agree that thee case of multiple admins refreshing at the exact time is mlre rare and Admins needs to be advised on the usage of the domain refresh and implications.
I suggest improving our documentation of this function with relevant warning, rather than complicating the code for it.
Adding doc text flag