Created attachment 1308622 [details] engine and vdsm log Description of problem: Power management operations fail on the bladecenter agent. For example running on test command on the bladecenter agent, the engine shows Test failed: Internal JSON-RPC error Version-Release number of selected component (if applicable): ovirt-engine-4.2.0-0.0.master.20170731224404.git1758643.el7.centos.noarch fence-agents-bladecenter-4.0.11-66.el7.x86_64 vdsm-4.20.1-280.gite07b232.el7.centos.x86_64 How reproducible: 100% Steps to Reproduce: 1. Configure power management on bladecenter host 2. Run test command 3. Actual results: Test failed: Internal JSON-RPC error Expected results: Test succeeds without any problems Additional info: On power management proxy I can see error 2017-08-03 11:00:15,715+0300 DEBUG (jsonrpc/7) [root] FAILED: <err> = 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n'; <rc> = 1 (commands:94) 2017-08-03 11:00:15,716+0300 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.80 seconds (__init__:592)
The important part is: 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n' which mean that your fence agent settings are incorrect. So are you sure that test using the same settings was successful in previous oVirt version (as you have entered this is a regression)? If not sure or using the fence agent for the 1st time, please take a look at you bladecenter settings and consult fence_bladecenter man page to find out all options which are needed to use that fence device.
I rechecked the problem, and it was some problem with the host power management after eng-ops help, all works as expected and "test" command returns correct message: "Test successful: power on" But I believe we still can improve the messaging under the UI, something like "Host power management not reachable" instead of "Test failed: Internal JSON-RPC error". What do you think Eli?
(In reply to Artyom from comment #2) > What do you think Eli? The UI does not know that the "Host power management not reachable", it just reflects the message coming from the engine. Since this is a corner case in which you has a physical problem on the fencing device I think that the message is OK. Anyway, in any case the power management fails via the engine, the first step we request the reporter to do is to try to access the fencing agents directly via the fence-agents package scripts in the host /usr/sbin directory and in this case he will see the exact cause We have to understand here that we have a call from UX=>engine=>VDSM=>fence script, so in any problem we should investigate starting on the script end point.