Description of problem: In the event of a full host power outage (including fence devices) a "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is added to audit log after 18 failed fencing attempts. The alert is not removed once the problem is resolved and the host is restarted. Version-Release number of selected component (if applicable): rhevm-3.2.3-0.43.el6ev.noarch How reproducible: Always. Steps to Reproduce: 1. Remove all power to an active host, including any fence agents that are configured. 2. Wait 9 minutes for the reconnection timeout to elapse and the fencing attempts to begin. 3. Keep waiting for 18 fencing attempts to happen. 5. Observe "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert added to audit log. 6. Restore power to host. 7. Restart host and "Confirm host has been rebooted". 8. In webadmin portal edit host, go to "Power Management" tab, click test button to verify that fencing works. Actual results: "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert stays in audit log. Expected results: "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alert is removed from audit log. Additional info: 2013-12-13 12:54:36,972 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (pool-4-thread-48) [514c9cdf] FINISH, FenceVdsVDSCommand, return: Test Failed, Getting status of IPMI:1.2.3.4...Chassis power = Unknown Failed , log id: 1d1778a8 [...] 2013-12-13 12:54:36,974 ERROR [org.ovirt.engine.core.bll.FenceVdsBaseCommand] (pool-4-thread-48) [514c9cdf] Failed to verify host <hostname> stop status. Have retried 18 times with delay of 10 seconds between each retry. engine=> select * from audit_log where message like 'Failed to verify Host%'; -[ RECORD 1 ]-------+------------------------------------------------------------------------------------ audit_log_id | 7388 user_id | 00000000-0000-0000-0000-000000000000 user_name | vm_id | 00000000-0000-0000-0000-000000000000 vm_name | vm_template_id | vm_template_name | vds_id | vds_name | log_time | 2013-12-13 06:54:36.972-05 log_type_name | VDS_ALERT_FENCE_STATUS_VERIFICATION_FAILED log_type | 9005 severity | 10 message | Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually. processed | f storage_pool_id | storage_pool_name | storage_domain_id | storage_domain_name | vds_group_id | 00000000-0000-0000-0000-000000000000 vds_group_name | correlation_id | job_id | quota_id | quota_name | gluster_volume_id | 00000000-0000-0000-0000-000000000000 gluster_volume_name | origin | oVirt custom_event_id | -1 event_flood_in_sec | 30 custom_data | deleted | f
(In reply to Julio Entrena Perez from comment #0) I don't think this is a bug , the event just tells that this was occured , the only alert that is removed is the alert that indicates that PM is not configured or configured improperly , those will change once the PM configuration is changed and saved or tested again.
(In reply to Eli Mesika from comment #2) > those will change once the PM > configuration is changed and saved or tested again. Customer has already saved Power Management settings multiple times but "Failed to verify Host <hostname> Restart status, Please Restart Host <hostname> manually." alerts remain, so does this bug.
(In reply to Julio Entrena Perez from comment #3) > (In reply to Eli Mesika from comment #2) > > those will change once the PM > > configuration is changed and saved or tested again. > > Customer has already saved Power Management settings multiple times but > "Failed to verify Host <hostname> Restart status, Please Restart Host > <hostname> manually." alerts remain, so does this bug. Those alerts are removed when the Host is fenced manuall , i.e. from UI right click the Host and "confirm that Host has been rebooted" , this will clear those alerts This requires that you will really reboot the Host manually first as stated at the dialog message. Please let me know if it works for you
The problem was that when this Alert was recorded , it was recorded with an empty Host ID , therefor , it was not removed from teh database when the manual fencing procedure was executed Removing the needinfo after talking with BZ reporter and getting to the BZ cause
ovirt 3.4.0 alpha has been released
is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch
(In reply to Tareq Alayan from comment #7) > is this merged into rhevm-3.4.0-0.3.master.el6ev.noarch rhevm-3.4.0-0.3.master.el6ev.noarch is AV2 BZ is part of AV2.1
verified, unable to reproduce tested on rhevm-3.4.0-0.5.master.el6ev.noarch
*** Bug 1084466 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html