Bug 1141514

Summary: Power Management proxy selection keep selecting the same proxy first even is it constantly fails
Product: Red Hat Enterprise Virtualization Manager Reporter: Eli Mesika <emesika>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: ecohen, gklein, iheim, lpeer, oourfali, pstehlik, rbalakri, Rhev-m-bugs, sherold, slitmano, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: org.ovirt.engine-root-3.5.0-13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 17:08:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1139643    
Bug Blocks:    

Description Eli Mesika 2014-09-14 09:50:03 UTC
Description of problem:
Power Management proxy selection keep selecting the same proxy first even is it constantly fails 

Version-Release number of selected component (if applicable):


How reproducible: always


Steps to Reproduce:
1.Have H1 with PM and H2 in cluster C1
2.Have H3 in cluster C2 in the same DC as H1 & H2
3.Block communication from H2 to H1 Pm card
4.Restart H1 from UI Power Management 

Actual results:
Stop command is trying to use H2 as a proxy and fails and then use H3 as a proxy
From this point the status of H1 is probed in a loop each 10 seconds
In each such status command an attempt to use H2 as a proxy is done first possibly waiting each time for this operation to fail (20 sec default timeout in ipmilan) and just after that using successfully H3 as a proxy 

Expected results:
After H2 fails on stop/start command all attempts to get the status should be with the one that succeeded to serve as a proxy for the stop/start command 

Additional info:

Comment 1 Eyal Edri 2014-09-28 11:29:46 UTC
this bug was moved to MODIFIED before vt4 build date thus moving to ON_QA.
if you belive this bug isn't in vt4, please report to rhev-integ

Comment 2 sefi litmanovich 2014-10-02 11:43:46 UTC
Tried to verify with vt4 - rhevm-3.5.0-0.13.beta.el6ev.noarch.

Couldn't verify due another bz:

https://bugzilla.redhat.com/show_bug.cgi?id=1139643

tried to verify according to flow in the description, the first status with H2 failed, later the first stop command by H2 was reported as successful although the connection between H2 and H1's agent was blocked, then start allegedly worked as well and because nothing actually happened with the host it came back up in no time. this is exactly what happens in bz 1139643.

Comment 3 Eli Mesika 2014-10-02 14:31:02 UTC
Please make sure that you are checking on a version that had fixed 
https://bugzilla.redhat.com/show_bug.cgi?id=1139643

The fact that stop command was reported as successful although the connection to proxy host was blocked shows that you are testing on a VDSM that does not include the 1139643 fix 

Please attach  /usr/share/vdsm/API.py from the host that was selected as a proxy for the STOP operation

Comment 4 sefi litmanovich 2014-10-14 13:52:58 UTC
Reproduced this bug on rhevm-3.5.0-0.14.beta.el6ev.noarch according to description.

Comment 5 sefi litmanovich 2014-12-03 11:55:26 UTC
Verified with rhevm-3.5.0-0.20.el6ev.noarch according to description.

Comment 6 Eyal Edri 2015-02-17 17:08:35 UTC
rhev 3.5.0 was released. closing.