Bug 1294584

Summary: [Docs] [RFE] Explain what happens when triedVdssList contains all hosts in SPM selection process.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: DocumentationAssignee: Andrew Burden <aburden>
Status: CLOSED CURRENTRELEASE QA Contact: Julie <juwu>
Severity: low Docs Contact:
Priority: medium    
Version: 3.6.0CC: aburden, adahms, amureini, gklein, laravot, lsurette, rbalakri, yeylon, ykaul
Target Milestone: ovirt-3.6.3Keywords: FutureFeature
Target Release: 3.6.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-06 23:03:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Docs RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2015-12-29 05:08:43 UTC
Description of problem:

The "Technical Reference" document, section 2.9 (Storage Pool Manager Selection Process) states:

"If the Storage Pool Manager role assignment fails on a new host, the Red Hat Enterprise Virtualization Manager adds the host to a list containing the hosts the operation has failed on. On subsequent iterations of the SPM selection, the Red Hat Enterprise Virtualization Manager attempts to assign the role to a host that is not included in the list."

But it does not say what happens when all hosts are in this list. Does the selection process stops? When it resumes? Is this list cleared? Does the user need to take some steps?

I was digging through the engine code and found that this list might be cleared during "getCurrentIrsProxyData", in backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/irsbroker/IrsBrokerCommand.java. But I am not really sure under which conditions that code is reached.

Comment 1 Andrew Burden 2016-01-05 08:29:28 UTC
Thanks for raising this bug, Germano. 

Allon, are you able to shed any light on this?

Comment 2 Allon Mureinik 2016-01-05 09:48:48 UTC
Liron - you know these areas of the code better than anyone. Can you help Germano and Andrew out here please?

Comment 3 Liron Aravot 2016-01-21 09:46:05 UTC
Hi All,
The quoted test refers to the failover process when executing one operation, when new operation is executed the list will be cleared and we'll try all the relevant hosts again.

for example, let's assume we perform operation A on host 1 and fail, if the failure is one that we determine as host specific we
will try then host 2 and then host 3, we'll stop retrying on different hosts when one of the following is true:
1. we tried all the applicable hosts
2. we reached the value of SpmCommandFailOverRetries config value.

When we perform another operation, we'll try executing on host 1 again and so on.

let me know if there are any further questions.

thanks,
Liron.

Comment 4 Germano Veit Michel 2016-01-21 23:07:31 UTC
Hi Liron.

Thank you for your reply.

So the list is cleared for each operation. Perhaps we should put this more clearly in the documentation. The following confused the me and the customer:

"On subsequent iterations of the SPM selection, the Red Hat Enterprise Virtualization Manager attempts to assign the role to a host that is not included in the list."

Andrew, what do you think?

Cheers,
Germano

Comment 5 Andrew Burden 2016-01-25 05:56:23 UTC
Thanks for that, Liron. Makes perfect sense now.

I'll update the para in question so as to read:
"
If the SPM role assignment fails on a new host, the Red Hat Enterprise Virtualization Manager adds the host to a list containing the hosts the operation has failed on, marking these hosts as ineligible for the SPM role. This list is cleared at the beginning of the SPM selection process so that all hosts are again eligible.
"

Comment 7 Julie 2016-01-27 00:46:32 UTC
Made some minor edits based on the discussion with Andrew B.
Added revision history entry in both guides. 
Moving this bug to VERIFIED.

Cheers,
Julie