Bug 1595422

Summary: RFE: ability to clear stonith history
Product: Red Hat Enterprise Linux 7 Reporter: Ken Gaillot <kgaillot>
Component: pacemakerAssignee: Klaus Wenninger <kwenning>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: abeekhof, aherr, cluster-maint, kwenning, mmazoure, phagara
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.20-1.el7 Doc Type: No Doc Update
Doc Text:
The relevant pcs functionality should be documented instead.
Story Points: ---
Clone Of:
: 1595444 (view as bug list) Environment:
Last Closed: 2019-08-06 12:53:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1461964, 1595444, 1608369, 1620190    

Description Ken Gaillot 2018-06-26 21:52:21 UTC
Description of problem: Pacemaker's fence daemon tracks a history of all fence actions (pending, successful, and failed) taken, which can be displayed by the stonith_admin --history command, and soon by crm_mon (pcs status). However there is no way to clear the history, which will be especially relevant if showing fence failures becomes the default in crm_mon as expected.

A possible interface is a new stonith_admin option, e.g. --clear-history. It may be worthwhile to accept an optional argument "failures" or "all" (defaulting to "all"), or perhaps "all" should be the only behavior.

Comment 4 Ken Gaillot 2019-01-15 16:48:30 UTC
QA: The interface implemented is:

 stonith_admin --cleanup --history=NODE

where NODE can be a particular node name or '*' to clean all.

Comment 5 Patrik Hagara 2019-03-20 10:53:04 UTC
a few questions regarding the expected behavior:

  * the command `stonith_admin --cleanup --history node-01` should remove all recorded fence events (pending, successful and failed) from all the cluster nodes -- ie. running `stonith_admin --history node-01` afterwards from any node should show nothing, right? (and ofc, the history should not reappear after being deleted, ie. be re-synced via the mechanism from bz#1555938)

  * since the history cleanup command is supposed to remove all fence events, including pending ones, does that not interfere with cluster state transitions as calculated by pengine? assuming the stonith history is just a copy of that information and deleting it has no effect on cluster behavior -- wouldn't that prevent users from seeing the pending fence actions after clearing stonith history?

Comment 6 Klaus Wenninger 2019-03-20 12:18:41 UTC
--cleanup isn't gonna delete any pending entries as there is no separate tables for user-query and cluster behaviour.
The history is deleted on the current partition. So in theory it would be possible that the history persists on a node that wasn't part of the partition the stonith_admin-command was issued. And of course this could be synced back later on when the cluster isn't partitioned anymore.
The syncing results in all nodes of the partition (at the time the stonith_admin-command is issued) getting a superposition of all history entries available within this partition.
As there is a theoretical risk for the combination of these mechanisms leading to long history records the length of these records is trimmed to the most recent 500 events.

Comment 7 Patrik Hagara 2019-03-20 12:25:19 UTC
qa-ack+

reproducer in comment#4 and corner cases to check in comment#5 and comment#6

Comment 9 Michal Mazourek 2019-06-13 15:39:13 UTC
# rpm -q pacemaker
pacemaker-1.1.20-5.el7.x86_64

# stonith_admin --help
...
-c, --cleanup          Cleanup wherever appropriate.
...
-H, --history=value    Show last successful fencing operation for named node
                        (or '*' for all nodes). Optional: --timeout, --cleanup
...

history show
============

# stonith_admin --history virt-012
virt-018 was able to reboot node virt-012 on behalf of stonith_admin.1566 from virt-018 at Thu Jun 13 15:28:01 2019

# stonith_admin --history virt-018
virt-022 was able to reboot node virt-018 on behalf of stonith_admin.12216 from virt-022 at Thu Jun 13 15:39:38 2019

history cleanup
===============

# stonith_admin --cleanup --history virt-012
cleaning up fencing-history for node virt-012

## History for virt-012 is now clear on all nodes and doesn't reappear after reboot
# stonith_admin --history virt-012
#

## History for virt-018 is still logged
# stonith_admin --history virt-018
virt-022 was able to reboot node virt-018 on behalf of stonith_admin.12216 from virt-022 at Thu Jun 13 15:39:38 2019

# stonith_admin --cleanup --history '*'
cleaning up fencing-history for node *
# stonith_admin --history '*'
#

Verified for pacemaker-1.1.20-5.el7

Comment 11 errata-xmlrpc 2019-08-06 12:53:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2129