Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1045139 - In the event of a full host power outage (including fence devices) VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED alert remains in audit log
In the event of a full host power outage (including fence devices) VDS_ALERT_...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
All Linux
medium Severity medium
: ---
: 3.4.0
Assigned To: Eli Mesika
Tareq Alayan
infra
:
: 1084466 (view as bug list)
Depends On:
Blocks: 1044088
  Show dependency treegraph
 
Reported: 2013-12-19 12:09 EST by Julio Entrena Perez
Modified: 2016-02-10 14:16 EST (History)
14 users (show)

See Also:
Fixed In Version: ovirt-3.4.0-alpha1
Doc Type: Bug Fix
Doc Text:
Previously, a full host power outage followed by 18 failed fencing attempts resulted in the following alert being added to the audit log: "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually". The alert was recorded with an empty Host ID and therefore was not removed from the database once manual fencing was executed. Now, this issue has been corrected and the alert is removed from the audit log after manually rebooting the host.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-09 11:07:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 22911 None None None Never
Red Hat Product Errata RHSA-2014:0506 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 14:55:38 EDT

  None (edit)
Description Julio Entrena Perez 2013-12-19 12:09:54 EST
Description of problem:
In the event of a full host power outage (including fence devices) a "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is added to audit log after 18 failed fencing attempts.
The alert is not removed once the problem is resolved and the host is restarted.

Version-Release number of selected component (if applicable):
rhevm-3.2.3-0.43.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1.  Remove all power to an active host, including any fence agents that are configured.
2.  Wait 9 minutes for the reconnection timeout to elapse and the fencing attempts to begin.
3.  Keep waiting for 18 fencing attempts to happen.
5.  Observe "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert added to audit log.
6.  Restore power to host.
7.  Restart host and "Confirm host has been rebooted".
8.  In webadmin portal edit host, go to "Power Management" tab, click test button to verify that fencing works. 

Actual results:
"Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert stays in audit log.

Expected results:
"Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is removed from audit log.

Additional info:
2013-12-13 12:54:36,972 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (pool-4-thread-48) [514c9cdf] FINISH, FenceVdsVDSCommand, return: Test Failed, Getting status of IPMI:1.2.3.4...Chassis power = Unknown
Failed
, log id: 1d1778a8
[...]
2013-12-13 12:54:36,974 ERROR [org.ovirt.engine.core.bll.FenceVdsBaseCommand] (pool-4-thread-48) [514c9cdf] Failed to verify host <hostname> stop status. Have retried 18 times with delay of 10 seconds between each retry.

engine=> select * from audit_log where message like 'Failed to verify Host%';
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------
audit_log_id        | 7388
user_id             | 00000000-0000-0000-0000-000000000000
user_name           | 
vm_id               | 00000000-0000-0000-0000-000000000000
vm_name             | 
vm_template_id      | 
vm_template_name    | 
vds_id              | 
vds_name            | 
log_time            | 2013-12-13 06:54:36.972-05
log_type_name       | VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED
log_type            | 9005
severity            | 10
message             | Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually.
processed           | f
storage_pool_id     | 
storage_pool_name   | 
storage_domain_id   | 
storage_domain_name | 
vds_group_id        | 00000000-0000-0000-0000-000000000000
vds_group_name      | 
correlation_id      | 
job_id              | 
quota_id            | 
quota_name          | 
gluster_volume_id   | 00000000-0000-0000-0000-000000000000
gluster_volume_name | 
origin              | oVirt
custom_event_id     | -1
event_flood_in_sec  | 30
custom_data         | 
deleted             | f
Comment 2 Eli Mesika 2013-12-22 18:58:31 EST
(In reply to Julio Entrena Perez from comment #0)

I don't think this is a bug , the event just tells that this was occured , the only alert that is removed is the alert that indicates that PM is not configured or configured improperly , those will change once the PM configuration is changed and saved or tested again.
Comment 3 Julio Entrena Perez 2013-12-30 05:18:26 EST
(In reply to Eli Mesika from comment #2)
> those will change once the PM
> configuration is changed and saved or tested again.

Customer has already saved Power Management settings multiple times but "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alerts remain, so does this bug.
Comment 4 Eli Mesika 2014-01-01 08:51:12 EST
(In reply to Julio Entrena Perez from comment #3)
> (In reply to Eli Mesika from comment #2)
> > those will change once the PM
> > configuration is changed and saved or tested again.
> 
> Customer has already saved Power Management settings multiple times but
> "Failed to verify Host <hostname> Restart status, Please Restart Host
> <hostname> manually." alerts remain, so does this bug.

Those alerts are removed when the Host is fenced manuall , i.e. from UI right click the Host and "confirm that Host has been rebooted" , this will clear those alerts 

This requires that you will really reboot the Host manually first as stated at the dialog message.

Please let me know if it works for you
Comment 5 Eli Mesika 2014-01-01 10:27:47 EST
The problem was that when this Alert was recorded , it was recorded with an empty Host ID , therefor , it was not removed from teh database when the manual fencing procedure was executed

Removing the needinfo after talking with BZ reporter and getting to the BZ cause
Comment 6 Sandro Bonazzola 2014-01-14 03:44:52 EST
ovirt 3.4.0 alpha has been released
Comment 7 Tareq Alayan 2014-03-12 08:56:05 EDT
is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch
Comment 8 Eli Mesika 2014-03-12 10:05:23 EDT
(In reply to Tareq Alayan from comment #7)
> is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch

rhevm-3.4.0-0.3.master.el6ev.noarch is AV2 
BZ is part of AV2.1
Comment 9 Tareq Alayan 2014-03-18 08:08:20 EDT
verified, unable to reproduce tested on rhevm-3.4.0-0.5.master.el6ev.noarch
Comment 11 Pablo Iranzo Gómez 2014-04-04 08:56:18 EDT
*** Bug 1084466 has been marked as a duplicate of this bug. ***
Comment 12 errata-xmlrpc 2014-06-09 11:07:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html

Note You need to log in before you can comment on or make changes to this bug.