Bug 987295 - PRD35 - [RFE] Add periodic power management health check to detect/warn about link-down detection of power management LAN
Summary: PRD35 - [RFE] Add periodic power management health check to detect/warn about...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.5.0
Assignee: Eli Mesika
QA Contact: sefi litmanovich
URL:
Whiteboard: infra
: 845232 (view as bug list)
Depends On: 1090800
Blocks: 958503 960739 999431 1075672 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2013-07-23 07:45 UTC by yuzuru.maya.zn
Modified: 2016-02-10 19:36 UTC (History)
25 users (show)

Fixed In Version: vt1.3
Doc Type: Release Note
Doc Text:
With this release, support for periodic power management health check to detect and warn about link-down detection of power management LAN has been added.
Clone Of:
: 999431 (view as bug list)
Environment:
Last Closed: 2015-02-11 17:53:47 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0158 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 22:38:50 UTC
oVirt gerrit 27367 0 None None None Never

Description yuzuru.maya.zn 2013-07-23 07:45:55 UTC
1. Feature Overview:
  a) Name of feature:
   Support link-down detection of management LAN

  b) Feature Description:
     Currently, RHEV-M doesn't monitor management LAN and so it can't detect link-down in management LAN. With this feature, RHEV-M monitor management LAN in order to detect its link-down.

2. Feature Details:

  a) Architectures:
     64-bit Intel EM64T/AMD64

  b) Bugzilla Dependencies:
     None

  c) Drivers or hardware dependencies:
     None

  d) Upstream acceptance information:
     None

  e) External links:
     None

  f) Severity (H,M,L):
     High

  g) Feature Needed by:
     2014 1Q

Comment 2 Itamar Heim 2013-07-23 09:44:46 UTC
can you please elaborate - if the management network is down, rhev-m will move the host to non-responsive (and can send an email alert via the notification serviced)?

Comment 4 Itamar Heim 2013-08-05 13:59:18 UTC
the power management has a "check status" option to validate the device and config are correct (the host itself may not have any access to it).
it sounds like doing a power management status test every X minutes will cover this?

Comment 5 Satoru Moriya 2013-08-07 02:45:12 UTC
(In reply to Itamar Heim from comment #4)
> the power management has a "check status" option to validate the device and
> config are correct (the host itself may not have any access to it).

How can we use that option?
Host tab -> Select host -> Edit -> Power Management -> test?

> it sounds like doing a power management status test every X minutes will
> cover this?

Yes, that's what we need.

Comment 6 Itamar Heim 2013-08-07 06:23:50 UTC
(In reply to Satoru Moriya from comment #5)
> (In reply to Itamar Heim from comment #4)
> > the power management has a "check status" option to validate the device and
> > config are correct (the host itself may not have any access to it).
> 
> How can we use that option?
> Host tab -> Select host -> Edit -> Power Management -> test?

yes. you can also do this via scripting via the /api/hosts/<hostid>/fence 
with fence_type status option.

> 
> > it sounds like doing a power management status test every X minutes will
> > cover this?
> 
> Yes, that's what we need.

i suggest using scripting for now, but keeping this RFE open for considering adding hourly power management status check.

Comment 7 Satoru Moriya 2013-08-20 08:13:40 UTC
(In reply to Itamar Heim from comment #6)
> (In reply to Satoru Moriya from comment #5)
> > (In reply to Itamar Heim from comment #4)
> > > the power management has a "check status" option to validate the device and
> > > config are correct (the host itself may not have any access to it).
> > 
> > How can we use that option?
> > Host tab -> Select host -> Edit -> Power Management -> test?
> 
> yes. you can also do this via scripting via the /api/hosts/<hostid>/fence 
> with fence_type status option.

We can get power management status following 2 ways:

1. REST API
 # curl -X POST -H "Accept: application/xml" -H "Content-Type:application/xml" \
  -u [USER:PASS] --cacert [CERT] \
  -d "<action><fence_type>status</fence_type></action>" \
  https://[RHEVM HOST]:443/api/hosts/[HOST id]/fence

2. rhevm-shell
 # rhevm-shell -c -l "https://[RHEVM HOST]/api" -u USER -A rhevm.cer
 # action host [HOST] fence --fence_type status

But, unfortunately, we can't find any reference for #2, in particular, options
for fence action (e.g. --fence_type) in "RHEV3.2 Command Line Shell Guide".
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.2/html/Command_Line_Shell_Guide/index.html

We guessed the option name, --fence_type, from XML which we use in #1(REST API).
Do you have any documents which refer --fence_type option etc?

Comment 8 Itamar Heim 2013-08-20 09:17:19 UTC
moving bug to guides component to review the documentation for missing part.
thanks for the reoprt.

Comment 10 Satoru Moriya 2013-08-21 05:30:21 UTC
(In reply to Itamar Heim from comment #8)
> moving bug to guides component to review the documentation for missing part.
> thanks for the reoprt.

Thanks.
Fixing documentation is really helpful to us.

On the other hand, originally, we would like to have the following feature
which you mentioned in #4 and opened this bz.

> it sounds like doing a power management status test every X minutes will
> cover this?

So, we'd like to keep this bz to RFE not document fix for considering adding
hourly power management status check.

How do we handle both fixing document and RFE request?
Should we open bz for each request?
(Should we open a new bz for fixing document?)

Comment 11 Itamar Heim 2013-08-21 10:14:46 UTC
good point. changing title accordingly and moving back. I'll also clone for the docs issue.

Comment 12 Itamar Heim 2013-08-22 08:46:27 UTC
*** Bug 845232 has been marked as a duplicate of this bug. ***

Comment 19 Tareq Alayan 2014-05-29 09:26:22 UTC
Hi Eli, 

What is the behaviour decided on the end according to comment 18? 
In other words what should be tested here and how?

Comment 20 Eli Mesika 2014-05-29 11:35:18 UTC
(In reply to Tareq Alayan from comment #19)
> Hi Eli, 
> 
> What is the behaviour decided on the end according to comment 18? 
> In other words what should be tested here and how?

Please read the feature doc
http://www.ovirt.org/Features/Design/DetailedPMHealthCheck

Comment 23 errata-xmlrpc 2015-02-11 17:53:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html


Note You need to log in before you can comment on or make changes to this bug.