Bug 233428 - fence_egenera script should not fence node during crash dump
Summary: fence_egenera script should not fence node during crash dump
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: fence
Version: 4
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jim Parsons
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 251358
TreeView+ depends on / blocked
 
Reported: 2007-03-22 13:26 UTC by Neal Pitts
Modified: 2018-10-19 23:39 UTC (History)
6 users (show)

Fixed In Version: RHBA-2007-0996
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-21 21:54:31 UTC
Embargoed:


Attachments (Terms of Use)
updated fence_egenera script (8.26 KB, application/octet-stream)
2007-04-12 21:29 UTC, Calvin Smith
no flags Details
updated fence_egenera script (8.26 KB, application/octet-stream)
2007-04-12 21:30 UTC, Calvin Smith
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0996 0 normal SHIPPED_LIVE fence bug fix update 2007-11-29 14:47:20 UTC

Description Neal Pitts 2007-03-22 13:26:21 UTC
Description of problem:
fence_egenera will fence a node during a forced or unexpected crash dump,
truncating the crash dump file.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. From PAN Manager, send an NMI to a blade in a GFS cluster
2. Blade will begin to dump core
3. Other nodes will fence the dumping node - verifiable in cluster logs
  
Actual results:
Truncated crash dump

Expected results:
Dumping node should not be fenced

Additional info:
Patch suggestions will be added.  fence_egenera just needs to check for a few
additional characters to see if the node is fencing (or booting as opposed to
booted).

Comment 1 Neal Pitts 2007-03-23 00:30:03 UTC
This is an excerpt from an email report on the problem:

This looks very easy to resolve. When a pServer has been sent an NMI and is in
the process of writing out a crash dump, the Control Blades see that its status
has changed and reflect this in the output of commands such as ‘bframe’ – the
pServer’s status changes from “Booted” to “Booted(KDB).” I would suggest that
the fence_egenera script simply be modified to look for the status of
Booted(KDB) and return success if it sees the pBlade it’s trying to fence is in
this state – meaning, the pBlade is effectively already removed from the
cluster; the fence script does not need to perform any action, so it might as
well say the pBlade was already fenced. The fact that the script itself didn’t
perform the fence and that it was performed by an external action is irrelevant.

Making this change would perhaps also prevent the fence_egenera script from
fencing a node twice - once per cBlade login from the living nodes.

Comment 2 Lon Hohberger 2007-03-23 16:03:57 UTC
KDB sounds suspiciously like "kernel debugger".  So, I have a couple of questions:

(a) Does "(KDB)" show up whenever the kernel enters the debugger?

If KDB can be entered in non-crash/non-dump cases (example below), assuming
success merely because the kernel debugger has been activated is simply *not*
sufficient to guarantee a clean fencing operation in all cases.

   echo "1" >/proc/sys/kernel/kdb


(b) What happens to the blade's status when the dump completes - assuming no
other intervention?  Does it sit there forever with (KDB), is it rebooted,
turned-off, etc.?

Comment 3 Lon Hohberger 2007-03-23 16:09:09 UTC
BTW - I agree that we shouldn't fence the node during a crash dump, but the
problem is that I'm not sure we can *accurately* detect the difference between a
crash dump and a user-induced debugging session (which, the user can then exit)
- which is necessary.

Comment 4 Dan Poler 2007-03-23 23:20:53 UTC
My understanding is that the status will only show Booted(KDB) when the Control 
Blade initiated an NMI. I'll try to confirm.

We could also instead just check for the log message that would be generated 
for the NMI if that's easier/safer?

Comment 5 Dan Poler 2007-03-26 17:24:04 UTC
My above comment has been confirmed. Our Principal TSE tested this; findings in
his own words:

Entering into the kernel debugger (I am assuming running crash on a live system
here) does not result in the Status changing from "Booted" in PAN Manager. Only
an NMI or actual crash dump will result in this state change. It is safe to
trigger off "Booted(KDB)" for determination of blade status for fencing purposes.

Hope this helps. Let me know if you need more information.

Comment 6 Lon Hohberger 2007-03-30 18:47:29 UTC
Sweet.  Thanks :)

Comment 7 Jim Parsons 2007-03-30 18:51:38 UTC
This is doable. Thanks for the clarification. Will need acks and all, but I will
post a sample script to try here when I have it ready - which will be soon. Not
a tough change.

Comment 8 Dan Poler 2007-03-30 18:58:23 UTC
When all this started I looked at the script myself and it looked pretty 
straightforward; there's a case statement whereby you're checking the status of 
the pBlade before executing the fence; one of your cases is already "if the 
pBlade is already down, return success and exit" -- when I looked at it, it 
seemed as if one could just add another entry to the same case statement, "if 
the blade has a status of Booted(KDB), return success and exit"

There may be a better way to do it, but that seemed like the easiest. :)

Note that this solution will only handle the situation of a user-initiated 
crash dump via an NMI. With this modification the server could still be fenced 
when it crashes of its own accord and starts dumping, since this does not 
change the status to Booted(KDB). Handling that case will be a little more 
tricky, but if you're already in there working on this, do you want to go two 
for the price of one and take that issue out too? It'd be a little tricker; 
you'd probably have to check a log file to determine if the pBlade is dumping.

Comment 9 Jim Parsons 2007-03-30 19:32:22 UTC
Generally, the fence philosophy is to shoot first and ask questions later - for
the sake of data...I can see how it would be useful, but I wonder if this would
significantly delay a fence action? Usually, sysadmins are requesting fencing
happen *faster*. Also, can arbitrary logs be grepped remotely through the
management interface?

Thanks

Comment 11 Bryn M. Reeves 2007-04-12 18:43:50 UTC
Switching to NEEDINFO based on Jim's question in comment #9


Comment 12 Dan Poler 2007-04-12 20:09:22 UTC
Hi,

My apologies - I did not know you were waiting on input from me.

I'm just saying it's possible -- do you want to address it in this go-around, 
or should we address in another ticket?

Logs can be analyzed via the ssh connection into the cBlade. The log file in 
particular you'd be interested in is /opt/panmgr/bin/event.log. I would need to 
do a little research on which messages would be the correct ones to look for; 
it'd be along the lines of "Control Blade detects <lpan/pserver> is 
crashdumping to file" -- if you see that message for what you're trying to 
fence, then don't fence. But the question is, the message could appear in the 
file but be old -- how do we tell? Certainly would need to ensure clocks are 
reasonably accurate... But if I saw the message a half hour prior to the time 
of the attempted fence operation... Should I still fence him? What if he's 
still dumping? What about an hour before? etc...



Comment 13 Calvin Smith 2007-04-12 21:29:52 UTC
Created attachment 152508 [details]
updated fence_egenera script

This patch implements code from the comments of Dan Polar, Egenera

Comment 14 Calvin Smith 2007-04-12 21:30:29 UTC
Created attachment 152509 [details]
updated fence_egenera script

This updated script implements code from the comments of Dan Polar, Egenera

Comment 15 Calvin Smith 2007-04-13 20:25:41 UTC
Have done some testing and this version of the script does the right thing if
the server is dumping core either through an NMI or a system crash that would
dump core. Tested this on Pan Manager 5.0.0.5 so YMMV on different revisions. 

Also, is this the "right" way to do it, as in the the fence script returns a
success if the server is still dumping instead of after the box has completely
gone down (or been power-cycled)? If we do set it up so that fencing fails until
the server has actually gone down, is there an error code that the fencing
script should send out? 

Comment 17 Jim Parsons 2007-05-09 20:45:33 UTC
As far as retuen code of the script goes, there is no intermediate value to be
returned that says, 'please wait'. It says either success or failure. If we can
detect dump state a the time of fencing, it seems likely that the thing is
headed down; so I am OK with returning success.

In regards to the 'Big Enchilada' mentioned above  by Dan wherein we try and
detect crash dumps initiated by the system and not the user - I fear corner
cases there. There should be a separate ticket for that, please. I wish we could
 just detect a value in /proc or sysfs that would let us know dumpstate. Do you
know of any? This would be faster probably and maybe even more reliable than
checking to see if logs are growing/have a mod time within millis of current
time/parsing...

I intend to take the patch in the next update of rhel5 as well as rhel4. The
customer should feel comfortable using the agent script they are using (the one
they submitted as an attachment to this ticket). 

Is that OK? All parties happy for now? Lon?

Comment 18 Jason Willeford 2007-07-25 18:52:22 UTC
Jim,
I don't see any acks for this.  Is it still on for 5.1/4.6?

Comment 20 Jim Parsons 2007-08-08 16:36:57 UTC
This is all set for inclusion in 4.6

Comment 23 errata-xmlrpc 2007-11-21 21:54:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0996.html



Note You need to log in before you can comment on or make changes to this bug.