Red Hat Bugzilla – Bug 233428
fence_egenera script should not fence node during crash dump
Last modified: 2010-10-22 09:54:52 EDT
Description of problem:
fence_egenera will fence a node during a forced or unexpected crash dump,
truncating the crash dump file.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. From PAN Manager, send an NMI to a blade in a GFS cluster
2. Blade will begin to dump core
3. Other nodes will fence the dumping node - verifiable in cluster logs
Truncated crash dump
Dumping node should not be fenced
Patch suggestions will be added. fence_egenera just needs to check for a few
additional characters to see if the node is fencing (or booting as opposed to
This is an excerpt from an email report on the problem:
This looks very easy to resolve. When a pServer has been sent an NMI and is in
the process of writing out a crash dump, the Control Blades see that its status
has changed and reflect this in the output of commands such as ‘bframe’ – the
pServer’s status changes from “Booted” to “Booted(KDB).” I would suggest that
the fence_egenera script simply be modified to look for the status of
Booted(KDB) and return success if it sees the pBlade it’s trying to fence is in
this state – meaning, the pBlade is effectively already removed from the
cluster; the fence script does not need to perform any action, so it might as
well say the pBlade was already fenced. The fact that the script itself didn’t
perform the fence and that it was performed by an external action is irrelevant.
Making this change would perhaps also prevent the fence_egenera script from
fencing a node twice - once per cBlade login from the living nodes.
KDB sounds suspiciously like "kernel debugger". So, I have a couple of questions:
(a) Does "(KDB)" show up whenever the kernel enters the debugger?
If KDB can be entered in non-crash/non-dump cases (example below), assuming
success merely because the kernel debugger has been activated is simply *not*
sufficient to guarantee a clean fencing operation in all cases.
echo "1" >/proc/sys/kernel/kdb
(b) What happens to the blade's status when the dump completes - assuming no
other intervention? Does it sit there forever with (KDB), is it rebooted,
BTW - I agree that we shouldn't fence the node during a crash dump, but the
problem is that I'm not sure we can *accurately* detect the difference between a
crash dump and a user-induced debugging session (which, the user can then exit)
- which is necessary.
My understanding is that the status will only show Booted(KDB) when the Control
Blade initiated an NMI. I'll try to confirm.
We could also instead just check for the log message that would be generated
for the NMI if that's easier/safer?
My above comment has been confirmed. Our Principal TSE tested this; findings in
his own words:
Entering into the kernel debugger (I am assuming running crash on a live system
here) does not result in the Status changing from "Booted" in PAN Manager. Only
an NMI or actual crash dump will result in this state change. It is safe to
trigger off "Booted(KDB)" for determination of blade status for fencing purposes.
Hope this helps. Let me know if you need more information.
Sweet. Thanks :)
This is doable. Thanks for the clarification. Will need acks and all, but I will
post a sample script to try here when I have it ready - which will be soon. Not
a tough change.
When all this started I looked at the script myself and it looked pretty
straightforward; there's a case statement whereby you're checking the status of
the pBlade before executing the fence; one of your cases is already "if the
pBlade is already down, return success and exit" -- when I looked at it, it
seemed as if one could just add another entry to the same case statement, "if
the blade has a status of Booted(KDB), return success and exit"
There may be a better way to do it, but that seemed like the easiest. :)
Note that this solution will only handle the situation of a user-initiated
crash dump via an NMI. With this modification the server could still be fenced
when it crashes of its own accord and starts dumping, since this does not
change the status to Booted(KDB). Handling that case will be a little more
tricky, but if you're already in there working on this, do you want to go two
for the price of one and take that issue out too? It'd be a little tricker;
you'd probably have to check a log file to determine if the pBlade is dumping.
Generally, the fence philosophy is to shoot first and ask questions later - for
the sake of data...I can see how it would be useful, but I wonder if this would
significantly delay a fence action? Usually, sysadmins are requesting fencing
happen *faster*. Also, can arbitrary logs be grepped remotely through the
Switching to NEEDINFO based on Jim's question in comment #9
My apologies - I did not know you were waiting on input from me.
I'm just saying it's possible -- do you want to address it in this go-around,
or should we address in another ticket?
Logs can be analyzed via the ssh connection into the cBlade. The log file in
particular you'd be interested in is /opt/panmgr/bin/event.log. I would need to
do a little research on which messages would be the correct ones to look for;
it'd be along the lines of "Control Blade detects <lpan/pserver> is
crashdumping to file" -- if you see that message for what you're trying to
fence, then don't fence. But the question is, the message could appear in the
file but be old -- how do we tell? Certainly would need to ensure clocks are
reasonably accurate... But if I saw the message a half hour prior to the time
of the attempted fence operation... Should I still fence him? What if he's
still dumping? What about an hour before? etc...
Created attachment 152508 [details]
updated fence_egenera script
This patch implements code from the comments of Dan Polar, Egenera
Created attachment 152509 [details]
updated fence_egenera script
This updated script implements code from the comments of Dan Polar, Egenera
Have done some testing and this version of the script does the right thing if
the server is dumping core either through an NMI or a system crash that would
dump core. Tested this on Pan Manager 126.96.36.199 so YMMV on different revisions.
Also, is this the "right" way to do it, as in the the fence script returns a
success if the server is still dumping instead of after the box has completely
gone down (or been power-cycled)? If we do set it up so that fencing fails until
the server has actually gone down, is there an error code that the fencing
script should send out?
As far as retuen code of the script goes, there is no intermediate value to be
returned that says, 'please wait'. It says either success or failure. If we can
detect dump state a the time of fencing, it seems likely that the thing is
headed down; so I am OK with returning success.
In regards to the 'Big Enchilada' mentioned above by Dan wherein we try and
detect crash dumps initiated by the system and not the user - I fear corner
cases there. There should be a separate ticket for that, please. I wish we could
just detect a value in /proc or sysfs that would let us know dumpstate. Do you
know of any? This would be faster probably and maybe even more reliable than
checking to see if logs are growing/have a mod time within millis of current
I intend to take the patch in the next update of rhel5 as well as rhel4. The
customer should feel comfortable using the agent script they are using (the one
they submitted as an attachment to this ticket).
Is that OK? All parties happy for now? Lon?
I don't see any acks for this. Is it still on for 5.1/4.6?
This is all set for inclusion in 4.6
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.