461948 – Provide a method to wait for kdump to complete from fencing

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 461948 - Provide a method to wait for kdump to complete from fencing

Summary: Provide a method to wait for kdump to complete from fencing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	fence-agents
Sub Component:
Version:	6.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ryan O'Hara
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	309991 585266 585332 918795 1081175
TreeView+	depends on / blocked

Reported:	2008-09-11 16:14 UTC by Lon Hohberger
Modified:	2018-11-26 19:41 UTC (History)
CC List:	25 users (show)
Fixed In Version:	fence-agents-3.1.5-5.el6
Doc Type:	Enhancement
Doc Text:
Clone Of:
Clones:	918795 (view as bug list)
Environment:
Last Closed:	2011-12-06 12:22:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Using and testing the fence_kdump agent (7.51 KB, text/plain) 2011-08-02 16:58 UTC, Ryan O'Hara	no flags	Details
Using and testing the fence_kdump agent (8.22 KB, text/plain) 2011-08-04 16:17 UTC, Ryan O'Hara	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:1599	0	normal	SHIPPED_LIVE	fence-agents bug fix and enhancement update	2011-12-06 00:51:16 UTC

Description Lon Hohberger 2008-09-11 16:14:33 UTC

Description of problem:

We would like to see a way to delay fencing of a node if the node is in the kdump environment.

Quasi-similar functionality exists today in the form of "post_fail_delay". Unfortunately, this feature is a hard-stop, and does not allow:
* early termination if the dump completes quicker than the timeout, or
* extension of the timeout if the dump has not completed.

This request has two parts:

(1) Kernel (kdump environment) side

* Provide method to periodically send network packets to notify cluster of kdump status.
* (optional) Provide method to indicate kdump completion to the cluster.

(2) Cluster (fence) side

* Provide a method to listen for kdump "I am still dumping" packets
* Extend timeout when these packets are received
* If timeout exceeded, proceed with fencing.
* (optional) Proceed with fencing immediately if kdump completion packet is received.

Additional info:

This feature should be *disabled by default*, as it delays recovery processing in a high availability cluster. Delaying cluster recovery restricts or interrupts access to SANs, cluster file systems. Additionally, it negatively affects application availability (in HA failover environments). Customers wishing to use this feature must carefully weigh the benefits of obtaining crash dumps versus the benefits of faster cluster recovery.

The listener on the cluster side could be implemented as a fence agent which listens for the packets (rather than complicating fenced). If the listener is implemented as a fence agent, no core cluster infrastructure changes should be necessary and this option could be enabled/disabled in the cluster configuration at run-time.

Details:

* It can take 15-20 seconds before the kdump environment boots to the point of being able to send "don't taze me bro" packets (vgoyal).
* The cluster's default timeout is 10 seconds in RHEL5

Comment 1 RHEL Program Management 2009-02-05 23:34:11 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 2 David Teigland 2009-07-31 16:50:24 UTC

This is a duplicate of bug 309991, I don't know which to close.

Comment 4 David Teigland 2009-09-23 16:42:33 UTC

Same suggestion as the RHEL5 equivalent, bz 309991 -- use SAN fencing.  And if
someone wants a form of automatic reboot, call "reboot" at the end of the
kdump routine, or set up a watchdog to monitor the kdump and reboot when it's
done.

Comment 5 David Teigland 2009-09-23 19:10:39 UTC

Modifying Lon's original description a bit based on irc discussion:

(1) kdump side

kdump environment is required to have all the same networks configured in the same way as the normal cluster environment

kdump environment starts a new daemon fence_dump_ack which broadcasts DONTTAZEMEBRO packets on all networks every N seconds, as long as kdump continues to run properly.

fence_dump_ack needs to not be starved by the heavy disk i/o

I don't know what the complicating factors would be on this side.

(2) Cluster (fence) side

new fence agent fence_dump_check listens for broadcast messages on the cluster network

fence_dump_check upon receiving a dump_ack message, compares the source ip to the ip of the victim node name, if it matches it returns 0 (success) and the fencing is complete

fence_dump_check waits for up to M seconds to receive a matching dump_ack, if it doesn't it returns 1 (failure), the fence operation fails, and fenced moves on to the next configured fence agent.

Honestly, this doesn't seem as complicated as I feared it might be, although there could be some nasty situations that come up in implementation and testing.
One complicating factor will be non-crash fencing situations, where the failed/victim node is not kdumping, e.g. startup fencing, fencing due to network disruption or partition.  In each of these cases we'll be running fence_dump_check for M seconds for each victim that's not actually dumping before getting to the "real" fence agent.   SAN fencing still seems like a better solution in general, but if that's not available, this may work passably.

Comment 6 Perry Myers 2009-09-23 19:22:59 UTC

Questions for the kdump experts here...

1. How do we ensure that kdump network environment mirrors the real environment?
2. What can/can't we run as part of the kdump pre script?  Will running a daemon that sends out broadcast packets on all interfaces work or will that be problematic?

Question for dct:
Your described implementation of fence_dump_ack means that if kdump is successful the machine is never rebooted via a power fence.  Would it be useful for fence_dump_ack to return success so that the cluster can continue but also spawn a separate thread/process that waits for the final broadcast message from the dumping node so that it can call the secondary fence agent that will actually power cycle the node?

Comment 7 David Teigland 2009-09-23 20:40:14 UTC

Sorry I've never used kdump, but doesn't it reboot the machine when the dump completes?

Spawning something on the fencing node to monitor the remote kdump and power cycle the machine when it finishes sounds really ugly.  The machine doing the kdump should really be responsible for rebooting itself one way or another, even if that means using a watchdog for the worst situations.

Comment 8 Perry Myers 2009-09-23 21:05:13 UTC

Agreed.  Assuming kdump reboots the machine then we don't need what I described above.  Just the simple solution you outlined.

Comment 9 Lon Hohberger 2009-09-24 15:22:05 UTC

If you want fencing to reboot the node after kdump completes, you can simply have two different sorts of packets which fence_dump_check can wait for:

- WAIT - sent periodically when kdump is running
- DONE - sent after kdump completes

If fence_dump_check is not configured to honor WAIT packets, then fence_dump_check will exit immediately (success) after first WAIT packet is received.  At that point, the node will have entered the kdump environment, replacing the old kernel.

If fence_dump_check is configured to honor 'WAIT' packets (e.g. fence_dump_check wait="1" or whatever), then WAIT can extend wait time.  We don't return until (a) timeout or (b) DONE is received.  If users decide to use kdump in this way, other fencing agents at the same level may be used to reboot the node after the DONE packet is received without relying on further modifications to or requirements on the kdump environment.

Comment 10 Lon Hohberger 2009-09-24 15:25:22 UTC

Also, we could have a meta attribute passed in to fencing agents generated by fenced - for example, the number of remaining fencing agents for this method/level - making the requirement to configure a special attribute for fence_dump_check obsolete.

E.g. if meta_remaining > 0, then honor WAIT packets, else don't.

Comment 11 David Teigland 2009-09-24 17:00:25 UTC

If we really need to insert a reboot into the process, just have fence_dump_ack call reboot when it sees the dump finish... but I still think kdump itself may prefer to be in charge of that.

(BTW, I've been thinking for a while that fence_dump_ack is looking an aweful lot like a watchdog monitor.  I wonder if we could just configure the watchdog program to: monitor the kdump, reboot if the dump stalls, reboot after the dump completes, periodically call fence_dump_ack which would just send the broadcast.)

Comment 12 David Teigland 2010-01-11 17:21:06 UTC

This feature would require substantial work if it was to be done.

Comment 13 Lon Hohberger 2010-01-27 15:40:56 UTC

*** Bug 309991 has been marked as a duplicate of this bug. ***

Comment 16 Lon Hohberger 2010-04-30 14:42:08 UTC

Andrew pointed me at this:

http://www.gossamer-threads.com/lists/linuxha/dev/51968

It's not the same design we had in mind, but it's a lot less code to write.

This one creates a special user in the kdump environment and copies in a SSH key.  The cluster then uses ssh to connect to the kdump environment and check whether the host is dumping or not.

There are some issues with the implementation:
1) the stonith module appears to wait forever if it fails to connect
2) the stonith module is a stonith module and not usable by fenced at this point

I have not checked whether the patch to mkdumprd sill works; the patch was submitted to linux-ha-dev in 2008 shortly after the cluster summit in Prague.

Comment 17 Neil Horman 2010-04-30 15:43:14 UTC

Thats not a bad idea, although I'm little hesitant to start an sshd server while we're kdumping.  Nominally the kdump kernel/initrd is going to be operating in a very restrictive memory environment (typically 128Mb).  If we're doing dump filtering (which is memory intentisve), and we need to service ssh operations in parallel, we might be looking at out of memory conditions, or at least some failed allocations.  It would be better if the dumping system could initiate an action to prevent it from being fenced.  That way we can serialize the fence prevention and dump capture operations, and save on memory use

That, said, it won;t hurt to test this patch out and see how it goes.

Comment 20 Perry Myers 2010-07-15 21:03:24 UTC

Notes from Lon in a discussion he had with Neil/Subhendu:

Issues on the ssh implementation that Andrew noted.
It is very sensitive to:
- ssh key synchronization
- UID changes
- key changes

Also, adding sshd greatly increases:
- memory & dump image footprints

Now, turns out the model may be preferable - that is, have the cluster
connect to the dumping machine rather than wait for a special packet.  

Specifically, we can likely simply use nc to implement a simple server
on a predefined IP port.  This will save a lot on memory and on-disk
footprint for dump ramdisk because nc is a built-in for busybox.

Comment 21 David Teigland 2010-11-22 16:57:06 UTC

No body to work on this, kick down the road.

Comment 26 Ryan O'Hara 2011-08-01 10:46:29 UTC

Pushed new fence_kdump agent to fence-agents repo, both master and RHEL6 branch.

Comment 27 Ryan O'Hara 2011-08-02 16:58:36 UTC

Created attachment 516371 [details]
Using and testing the fence_kdump agent

Here is a write-up about how to use/test the new fence_kdump agent. I've described how you can test fence_kdump and fence_kdump_send independently as well as how to test it in a cluster with the kdump service enabled. Please send questions and comments.

Comment 30 Ryan O'Hara 2011-08-04 16:17:57 UTC

Created attachment 516744 [details]
Using and testing the fence_kdump agent

Updated the usage/testing write-up to describe how to change the behavior of fence_kdump_send.

Comment 33 Alan Brown 2011-11-04 16:24:57 UTC

This would be handy on RHEL5 too.

Dave, the issue with fencing (power fencing) when a crashed node is running kdump is that half the time the node is rebooted before vmcore is completed.

Kdump supposedly reboots the machine, but that doesn't always happen...

Comment 34 errata-xmlrpc 2011-12-06 12:22:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1599.html

Note You need to log in before you can comment on or make changes to this bug.

ajb2
casmith
ccaulfie
cluster-maint
crenshaw
djansa
djuran
fdinitto
grimme
hklein
hlawatschek
kskmori
lhh
michael.hagmann
mjuricek
nhorman
qcai
rick.beldin
rohara
rpeterso
samuel.kielek
syeghiay
tao
teigland
vgoyal