1045139 – In the event of a full host power outage (including fence devices) VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED alert remains in audit log

Bug 1045139 - In the event of a full host power outage (including fence devices) VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED alert remains in audit log

Summary: In the event of a full host power outage (including fence devices) VDS_ALERT_...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.2.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Eli Mesika
QA Contact:	Tareq Alayan
Docs Contact:
URL:
Whiteboard:	infra
Duplicates (1):	1084466 (view as bug list)
Depends On:
Blocks:	1044088
TreeView+	depends on / blocked

Reported:	2013-12-19 17:09 UTC by Julio Entrena Perez
Modified:	2019-05-20 11:07 UTC (History)
CC List:	14 users (show)
Fixed In Version:	ovirt-3.4.0-alpha1
Doc Type:	Bug Fix
Doc Text:	Previously, a full host power outage followed by 18 failed fencing attempts resulted in the following alert being added to the audit log: "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually". The alert was recorded with an empty Host ID and therefore was not removed from the database once manual fencing was executed. Now, this issue has been corrected and the alert is removed from the audit log after manually rebooting the host.
Clone Of:
Environment:
Last Closed:	2014-06-09 15:07:52 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2014:0506	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update	2014-06-09 18:55:38 UTC
oVirt gerrit	22911	0	None	MERGED	Fixing a bug in Alert logging.	2020-05-07 14:29:36 UTC

Description Julio Entrena Perez 2013-12-19 17:09:54 UTC

Description of problem:
In the event of a full host power outage (including fence devices) a "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is added to audit log after 18 failed fencing attempts.
The alert is not removed once the problem is resolved and the host is restarted.

Version-Release number of selected component (if applicable):
rhevm-3.2.3-0.43.el6ev.noarch

How reproducible:
Always.

Steps to Reproduce:
1.  Remove all power to an active host, including any fence agents that are configured.
2.  Wait 9 minutes for the reconnection timeout to elapse and the fencing attempts to begin.
3.  Keep waiting for 18 fencing attempts to happen.
5.  Observe "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert added to audit log.
6.  Restore power to host.
7.  Restart host and "Confirm host has been rebooted".
8.  In webadmin portal edit host, go to "Power Management" tab, click test button to verify that fencing works. 

Actual results:
"Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert stays in audit log.

Expected results:
"Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is removed from audit log.

Additional info:
2013-12-13 12:54:36,972 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (pool-4-thread-48) [514c9cdf] FINISH, FenceVdsVDSCommand, return: Test Failed, Getting status of IPMI:1.2.3.4...Chassis power = Unknown
Failed
, log id: 1d1778a8
[...]
2013-12-13 12:54:36,974 ERROR [org.ovirt.engine.core.bll.FenceVdsBaseCommand] (pool-4-thread-48) [514c9cdf] Failed to verify host <hostname> stop status. Have retried 18 times with delay of 10 seconds between each retry.

engine=> select * from audit_log where message like 'Failed to verify Host%';
-[ RECORD 1 ]-------+------------------------------------------------------------------------------------
audit_log_id        | 7388
user_id             | 00000000-0000-0000-0000-000000000000
user_name           | 
vm_id               | 00000000-0000-0000-0000-000000000000
vm_name             | 
vm_template_id      | 
vm_template_name    | 
vds_id              | 
vds_name            | 
log_time            | 2013-12-13 06:54:36.972-05
log_type_name       | VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED
log_type            | 9005
severity            | 10
message             | Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually.
processed           | f
storage_pool_id     | 
storage_pool_name   | 
storage_domain_id   | 
storage_domain_name | 
vds_group_id        | 00000000-0000-0000-0000-000000000000
vds_group_name      | 
correlation_id      | 
job_id              | 
quota_id            | 
quota_name          | 
gluster_volume_id   | 00000000-0000-0000-0000-000000000000
gluster_volume_name | 
origin              | oVirt
custom_event_id     | -1
event_flood_in_sec  | 30
custom_data         | 
deleted             | f

Comment 2 Eli Mesika 2013-12-22 23:58:31 UTC

(In reply to Julio Entrena Perez from comment #0)

I don't think this is a bug , the event just tells that this was occured , the only alert that is removed is the alert that indicates that PM is not configured or configured improperly , those will change once the PM configuration is changed and saved or tested again.

Comment 3 Julio Entrena Perez 2013-12-30 10:18:26 UTC

(In reply to Eli Mesika from comment #2)
> those will change once the PM
> configuration is changed and saved or tested again.

Customer has already saved Power Management settings multiple times but "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alerts remain, so does this bug.

Comment 4 Eli Mesika 2014-01-01 13:51:12 UTC

(In reply to Julio Entrena Perez from comment #3)
> (In reply to Eli Mesika from comment #2)
> > those will change once the PM
> > configuration is changed and saved or tested again.
> 
> Customer has already saved Power Management settings multiple times but
> "Failed to verify Host <hostname> Restart status, Please Restart Host
> <hostname> manually." alerts remain, so does this bug.

Those alerts are removed when the Host is fenced manuall , i.e. from UI right click the Host and "confirm that Host has been rebooted" , this will clear those alerts 

This requires that you will really reboot the Host manually first as stated at the dialog message.

Please let me know if it works for you

Comment 5 Eli Mesika 2014-01-01 15:27:47 UTC

The problem was that when this Alert was recorded , it was recorded with an empty Host ID , therefor , it was not removed from teh database when the manual fencing procedure was executed

Removing the needinfo after talking with BZ reporter and getting to the BZ cause

Comment 6 Sandro Bonazzola 2014-01-14 08:44:52 UTC

ovirt 3.4.0 alpha has been released

Comment 7 Tareq Alayan 2014-03-12 12:56:05 UTC

is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch

Comment 8 Eli Mesika 2014-03-12 14:05:23 UTC

(In reply to Tareq Alayan from comment #7)
> is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch

rhevm-3.4.0-0.3.master.el6ev.noarch is AV2 
BZ is part of AV2.1

Comment 9 Tareq Alayan 2014-03-18 12:08:20 UTC

verified, unable to reproduce tested on rhevm-3.4.0-0.5.master.el6ev.noarch

Comment 11 Pablo Iranzo Gómez 2014-04-04 12:56:18 UTC

*** Bug 1084466 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2014-06-09 15:07:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html

Note You need to log in before you can comment on or make changes to this bug.