Bug 1477917 - Power management operations fail on the bladecenter agent
Power management operations fail on the bladecenter agent
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra (Show other bugs)
x86_64 Linux
unspecified Severity high (vote)
: ---
: ---
Assigned To: Eli Mesika
Pavel Stehlik
: Regression
Depends On:
  Show dependency treegraph
Reported: 2017-08-03 04:17 EDT by Artyom
Modified: 2017-08-07 06:10 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-08-07 06:10:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
engine and vdsm log (292.51 KB, application/zip)
2017-08-03 04:17 EDT, Artyom
no flags Details

  None (edit)
Description Artyom 2017-08-03 04:17:20 EDT
Created attachment 1308622 [details]
engine and vdsm log

Description of problem:
Power management operations fail on the bladecenter agent.
For example running on test command on the bladecenter agent, the engine shows
Test failed: Internal JSON-RPC error

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Configure power management on bladecenter host
2. Run test command

Actual results:
Test failed: Internal JSON-RPC error

Expected results:
Test succeeds without any problems

Additional info:
On power management proxy I can see error
2017-08-03 11:00:15,715+0300 DEBUG (jsonrpc/7) [root] FAILED: <err> = 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n'; <rc> = 1 (commands:94)
2017-08-03 11:00:15,716+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.fenceNode failed (error 1) in 0.80 seconds (__init__:592)
Comment 1 Martin Perina 2017-08-03 08:13:23 EDT
The important part is:

 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n'

which mean that your fence agent settings are incorrect. So are you sure that test using the same settings was successful in previous oVirt version (as you have entered this is a regression)? If not sure or using the fence agent for the 1st time, please take a look at you bladecenter settings and consult fence_bladecenter man page to find out all options which are needed to use that fence device.
Comment 2 Artyom 2017-08-07 03:14:08 EDT
I rechecked the problem, and it was some problem with the host power management after eng-ops help, all works as expected and "test" command returns correct message:
"Test successful: power on"

But I believe we still can improve the messaging under the UI, something like
"Host power management not reachable" instead of "Test failed: Internal JSON-RPC error".

What do you think Eli?
Comment 3 Eli Mesika 2017-08-07 05:04:20 EDT
(In reply to Artyom from comment #2)

> What do you think Eli?

The UI does not know that the "Host power management not reachable", it just reflects the message coming from the engine.

Since this is a corner case in which you has a physical problem on the fencing device I think that the message is OK.

Anyway, in any case the power management fails via the engine, the first step we request the reporter to do is to try to access the fencing agents directly via the fence-agents package scripts in the host /usr/sbin directory and in this case he will see the exact cause 

We have to understand here that we have a call from UX=>engine=>VDSM=>fence script, so in any problem we should investigate starting on the script end point.

Note You need to log in before you can comment on or make changes to this bug.