Description of problem:
VDSM calls the fencing script async and return 0 if script invoked although the script may exit with an error code as described in he scenario below
Version-Release number of selected component (if applicable):
How reproducible: 100%
DC1 with H1 (with PM) and H2 on cluster C1
another host H3 on DC1 cluster C2
When we are blocking the communication with iptables from H2 to H1 PM
card and use the default proxy preferences (cluster, dc) a Restart
operation will always fail.
Steps to Reproduce:
1.Add H1 with PM and H2 to cluster C1 on data center DC1
2.Add H3 on cluster C2 on data center DC1
3. Block communication from H2 to H1 PM card IP
4.Restart H1 from UI (Power-Management->Restart)
Actual results:
H2 is selected as the proxy host for the stop operation and VDSM is returning that this operation is successful although the fencing script exit status is 1
Therefor, we are waiting for a 'off' status which will never occur and the host is not rebooted
Expected results:
VDSM should perform start/stop sync and return the correct script returned code in order that engine will know that H2 fails to perform the operation and will try to use H2 as a proxy for the failed operation
Additional info:
> Expected results:
> VDSM should perform start/stop sync and return the correct script returned
> code in order that engine will know that H2 fails to perform the operation
> and will try to use H2 as a proxy for the failed operation
sorry, should be
and will try to use H3 as a proxy for the failed operation
this bug status was moved to MODIFIED before vdsm vt5 was built,
hence moving to on_qa, if this was mistake and the fix isn't in,
please contact rhev-integ
Verified with rhevm-3.5.0-0.14.beta.el6ev.noarch, vdsm-4.16.6-1.el6ev.x86_64 according to description.
functionality works as expected - one question though:
once a fence action fails with proxy A in fence flow1, and proxy B is selected and is successful, why isn't proxy B used all along flow1?
the fact that each action is attempted first with proxy A, fails, looks for proxy B and perform the action, caused the restart action to take 10 minutes! although a restart takes no longer then 3-5 minutes.