Bug 987295 - PRD35 - [RFE] Add periodic power management health check to detect/warn about link-down detection of power management LAN
PRD35 - [RFE] Add periodic power management health check to detect/warn about...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Linux
high Severity high
: ---
: 3.5.0
Assigned To: Eli Mesika
sefi litmanovich
infra
: FutureFeature
: 845232 (view as bug list)
Depends On: 1090800
Blocks: 1075672 958503 960739 999431 rhev3.5beta 1156165
  Show dependency treegraph
 
Reported: 2013-07-23 03:45 EDT by yuzuru.maya.zn
Modified: 2016-02-10 14:36 EST (History)
25 users (show)

See Also:
Fixed In Version: vt1.3
Doc Type: Release Note
Doc Text:
With this release, support for periodic power management health check to detect and warn about link-down detection of power management LAN has been added.
Story Points: ---
Clone Of:
: 999431 (view as bug list)
Environment:
Last Closed: 2015-02-11 12:53:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 27367 None None None Never

  None (edit)
Description yuzuru.maya.zn 2013-07-23 03:45:55 EDT
1. Feature Overview:
  a) Name of feature:
   Support link-down detection of management LAN

  b) Feature Description:
     Currently, RHEV-M doesn't monitor management LAN and so it can't detect link-down in management LAN. With this feature, RHEV-M monitor management LAN in order to detect its link-down.

2. Feature Details:

  a) Architectures:
     64-bit Intel EM64T/AMD64

  b) Bugzilla Dependencies:
     None

  c) Drivers or hardware dependencies:
     None

  d) Upstream acceptance information:
     None

  e) External links:
     None

  f) Severity (H,M,L):
     High

  g) Feature Needed by:
     2014 1Q
Comment 2 Itamar Heim 2013-07-23 05:44:46 EDT
can you please elaborate - if the management network is down, rhev-m will move the host to non-responsive (and can send an email alert via the notification serviced)?
Comment 4 Itamar Heim 2013-08-05 09:59:18 EDT
the power management has a "check status" option to validate the device and config are correct (the host itself may not have any access to it).
it sounds like doing a power management status test every X minutes will cover this?
Comment 5 Satoru Moriya 2013-08-06 22:45:12 EDT
(In reply to Itamar Heim from comment #4)
> the power management has a "check status" option to validate the device and
> config are correct (the host itself may not have any access to it).

How can we use that option?
Host tab -> Select host -> Edit -> Power Management -> test?

> it sounds like doing a power management status test every X minutes will
> cover this?

Yes, that's what we need.
Comment 6 Itamar Heim 2013-08-07 02:23:50 EDT
(In reply to Satoru Moriya from comment #5)
> (In reply to Itamar Heim from comment #4)
> > the power management has a "check status" option to validate the device and
> > config are correct (the host itself may not have any access to it).
> 
> How can we use that option?
> Host tab -> Select host -> Edit -> Power Management -> test?

yes. you can also do this via scripting via the /api/hosts/<hostid>/fence 
with fence_type status option.

> 
> > it sounds like doing a power management status test every X minutes will
> > cover this?
> 
> Yes, that's what we need.

i suggest using scripting for now, but keeping this RFE open for considering adding hourly power management status check.
Comment 7 Satoru Moriya 2013-08-20 04:13:40 EDT
(In reply to Itamar Heim from comment #6)
> (In reply to Satoru Moriya from comment #5)
> > (In reply to Itamar Heim from comment #4)
> > > the power management has a "check status" option to validate the device and
> > > config are correct (the host itself may not have any access to it).
> > 
> > How can we use that option?
> > Host tab -> Select host -> Edit -> Power Management -> test?
> 
> yes. you can also do this via scripting via the /api/hosts/<hostid>/fence 
> with fence_type status option.

We can get power management status following 2 ways:

1. REST API
 # curl -X POST -H "Accept: application/xml" -H "Content-Type:application/xml" \
  -u [USER:PASS] --cacert [CERT] \
  -d "<action><fence_type>status</fence_type></action>" \
  https://[RHEVM HOST]:443/api/hosts/[HOST id]/fence

2. rhevm-shell
 # rhevm-shell -c -l "https://[RHEVM HOST]/api" -u USER -A rhevm.cer
 # action host [HOST] fence --fence_type status

But, unfortunately, we can't find any reference for #2, in particular, options
for fence action (e.g. --fence_type) in "RHEV3.2 Command Line Shell Guide".
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.2/html/Command_Line_Shell_Guide/index.html

We guessed the option name, --fence_type, from XML which we use in #1(REST API).
Do you have any documents which refer --fence_type option etc?
Comment 8 Itamar Heim 2013-08-20 05:17:19 EDT
moving bug to guides component to review the documentation for missing part.
thanks for the reoprt.
Comment 10 Satoru Moriya 2013-08-21 01:30:21 EDT
(In reply to Itamar Heim from comment #8)
> moving bug to guides component to review the documentation for missing part.
> thanks for the reoprt.

Thanks.
Fixing documentation is really helpful to us.

On the other hand, originally, we would like to have the following feature
which you mentioned in #4 and opened this bz.

> it sounds like doing a power management status test every X minutes will
> cover this?

So, we'd like to keep this bz to RFE not document fix for considering adding
hourly power management status check.

How do we handle both fixing document and RFE request?
Should we open bz for each request?
(Should we open a new bz for fixing document?)
Comment 11 Itamar Heim 2013-08-21 06:14:46 EDT
good point. changing title accordingly and moving back. I'll also clone for the docs issue.
Comment 12 Itamar Heim 2013-08-22 04:46:27 EDT
*** Bug 845232 has been marked as a duplicate of this bug. ***
Comment 19 Tareq Alayan 2014-05-29 05:26:22 EDT
Hi Eli, 

What is the behaviour decided on the end according to comment 18? 
In other words what should be tested here and how?
Comment 20 Eli Mesika 2014-05-29 07:35:18 EDT
(In reply to Tareq Alayan from comment #19)
> Hi Eli, 
> 
> What is the behaviour decided on the end according to comment 18? 
> In other words what should be tested here and how?

Please read the feature doc
http://www.ovirt.org/Features/Design/DetailedPMHealthCheck
Comment 23 errata-xmlrpc 2015-02-11 12:53:47 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html

Note You need to log in before you can comment on or make changes to this bug.