309991 – Add coordination between Kdump and Cluster Fencing for long kernel panic dumps

Bug 309991 - Add coordination between Kdump and Cluster Fencing for long kernel panic dumps

Summary: Add coordination between Kdump and Cluster Fencing for long kernel panic dumps

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cman
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	461948
Blocks:	585266 585332
TreeView+	depends on / blocked

Reported:	2007-09-27 21:12 UTC by Scott Crenshaw
Modified:	2016-04-26 14:48 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Clones:	585266 (view as bug list)
Environment:
Last Closed:	2011-09-30 15:02:45 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rob Kenna 2007-09-27 21:12:42 UTC

With large memory configurations, some machines take a long time to dump state
when a panic occurs.  The cluster software may well force a reboot as a fence
operation before the dump completes.  This cause the loss of important data to
diagnose the root problem.

Cluster fencing needs a mechanism to hold off fencing until the dump completes
or assurance from the failed node that it will not re-awaken and cause data
corruption of shared information.

Comment 1 Neil Horman 2007-09-28 11:56:30 UTC

I've added, as part of bz 269761, the ability to run an arbitrary script from
the kdump initrd prior to capturing a vmcore.  My thought was that we could use
this ability to fork a process that spoke to the cluster suite peer daemons in
such a way as to stall the fencing process.  This obviously requires that the
fencing suite contain some utility to drive the communication appropriately,
which can then be added to kdump via /etc/kdump.conf.  Thoughts Jim?

Comment 3 Jason Willeford 2008-01-21 21:21:52 UTC

Is there any status on this feature request Jim / Rob?  What are the next steps?

Comment 4 Rob Kenna 2008-01-29 18:40:26 UTC

This is now targeted for RHEL 5.3

Comment 9 Jim Parsons 2008-07-17 19:33:45 UTC

But, when a node is crashing, you want it DOWN. That is the reason for fencing
in the first place. Are you really willing to risk your data?

Comment 10 Neil Horman 2008-07-17 19:48:49 UTC

Please read the initial comment on this bug. The reason this bug was opened was
because the fencing functionality of our cluster suite was power-cycling a
crashed box before kdump could complete running on it and collect the system
vmcore for an post mortem analysis.  I agree that you want to fence nodes that
are crashed, but you also want to figure out why they crashed in the first
place, and you cant do that if the cluster reboots your crashed system before it
can record its memory image.  Including a utility/script with the cluster suite
to prevent fencing for use by the kdump_pre directive is, In my view, a good way
to do that.  As to weather or not it makes sense in a given environment is up
its sysadmin.

Comment 13 David Teigland 2009-07-20 15:31:21 UTC

I think the best way to deal with this problem is to use storage fencing rather than power fencing.

If there really is a need to do power fencing and delay it until kdump is done, then I like the idea of using a kdump hook to start a program that will broadcast periodic status messages on the progress of the kdump.  The other cluster nodes would monitor this and delay fencing.  I'm not sure if fencing would be considered successful once kdump was done without doing anything else (as done by stonith plugin below), or if we'd want to do the power fencing after kdump completed.

A third option is for remaining nodes to log into the kdumping node to monitor its progress.  NTT implemented this as a pacemaker stonith plugin:
http://hg.linux-ha.org/dev/file/tip/lib/plugins/stonith/external/README_kdumpcheck.txt
http://hg.linux-ha.org/dev/file/64f4592952ea/lib/plugins/stonith/external/kdumpcheck.in

Comment 14 David Teigland 2009-07-31 16:50:28 UTC

This is a duplicate of bug 461948, I don't know which to close.

Comment 15 Calvin Smith 2009-08-07 22:22:24 UTC

From bz 461948:

Not a duplicate.  One is for RHEL5 and one is for RHEL6.

Comment 19 Lon Hohberger 2010-01-27 15:40:56 UTC

Closing as a duplicate of 461948 since bug 461948 has a more complete set of design ideas.

*** This bug has been marked as a duplicate of bug 461948 ***

Comment 20 Calvin Smith 2010-03-01 14:08:29 UTC

Instead of closing this one, this should be blocked by 461948 since the ticket is for a different version of the OS. It stands to reason that this has to be developed in RHEL 6 before deciding whether to backport to RHEL 5 but don't agree that this ticket should be closed until decision as to whether this will be fixed in RHEL 5 at all is made.

Comment 21 Lon Hohberger 2010-04-23 15:04:27 UTC

Calvin, you're right.

Comment 23 Jaroslav Kortus 2010-09-15 16:54:14 UTC

Isn't the node in kdumping state already effectively fenced?

It loads completely new kernel and mounts only local storage (or the one defined in the kdump.conf in general). In this state it can hardly touch any shared resource unless you set it up this way. 

I'm thinking of these conditions:
 * kdump has a way to notify the cluster nodes that it has booted
 * cluster fencing is configurable so it waits up to XX secs or until kdump echo, whichever comes first
 * kdump echo is configurable and off by default

Then the customer willing to capture kdumps is aware of what's he doing, as he must be able to configure fencing and know the consequences, enable and configure kdump. More, the recovery is not limited by kdump in any other way than the configurable timeout until kdump starts.

Kdump restarts the machine at the end allowing normal operations to continue (cluster rejoin, migrate back the services, etc.). It can also hang and not restart the machine, but that's the risk you are willing to take when you enable it.

What do you think?

Comment 24 Rick Beldin 2010-09-15 19:16:33 UTC

I think you are right that once we have exec'd the new kernel we should probably not have access to any cluster resources, but I wonder if that is always the case.  For example, while I expect that the kdump kernel would not have support for gfs/gfs2, it probably needs access to ext3 fs in order to write out the dump.   Is it all possible that the kexec'd kernel still has access to a cluster resource, like an ext3 fs that is managed by RHCS?  If so, this could be a problem. 

Overall though, the reason that people want kdump is to determine root cause on failures.  Many customers have strict policies on when to return systems to service after failures and want to have as much  information as they can about the failure in order to understand the risk and exposure of the bug they may have just experienced.  I would expect most customers running RHCS in a high-availability type of situation would want to do so. 

It seems like there should be some guidelines for configurations and some limitations documented for the combination of kdump and RHCS.

Comment 25 Lon Hohberger 2010-09-15 19:26:19 UTC

There was already a design in the other bugzilla.  I figure the design would be used in both places.

Note You need to log in before you can comment on or make changes to this bug.