Hide Forgot
Description of problem: fence_egenera will fence a node during a forced or unexpected crash dump, truncating the crash dump file. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. From PAN Manager, send an NMI to a blade in a GFS cluster 2. Blade will begin to dump core 3. Other nodes will fence the dumping node - verifiable in cluster logs Actual results: Truncated crash dump Expected results: Dumping node should not be fenced Additional info: Patch suggestions will be added. fence_egenera just needs to check for a few additional characters to see if the node is fencing (or booting as opposed to booted).
This is an excerpt from an email report on the problem: This looks very easy to resolve. When a pServer has been sent an NMI and is in the process of writing out a crash dump, the Control Blades see that its status has changed and reflect this in the output of commands such as ‘bframe’ – the pServer’s status changes from “Booted” to “Booted(KDB).” I would suggest that the fence_egenera script simply be modified to look for the status of Booted(KDB) and return success if it sees the pBlade it’s trying to fence is in this state – meaning, the pBlade is effectively already removed from the cluster; the fence script does not need to perform any action, so it might as well say the pBlade was already fenced. The fact that the script itself didn’t perform the fence and that it was performed by an external action is irrelevant. Making this change would perhaps also prevent the fence_egenera script from fencing a node twice - once per cBlade login from the living nodes.
KDB sounds suspiciously like "kernel debugger". So, I have a couple of questions: (a) Does "(KDB)" show up whenever the kernel enters the debugger? If KDB can be entered in non-crash/non-dump cases (example below), assuming success merely because the kernel debugger has been activated is simply *not* sufficient to guarantee a clean fencing operation in all cases. echo "1" >/proc/sys/kernel/kdb (b) What happens to the blade's status when the dump completes - assuming no other intervention? Does it sit there forever with (KDB), is it rebooted, turned-off, etc.?
BTW - I agree that we shouldn't fence the node during a crash dump, but the problem is that I'm not sure we can *accurately* detect the difference between a crash dump and a user-induced debugging session (which, the user can then exit) - which is necessary.
My understanding is that the status will only show Booted(KDB) when the Control Blade initiated an NMI. I'll try to confirm. We could also instead just check for the log message that would be generated for the NMI if that's easier/safer?
My above comment has been confirmed. Our Principal TSE tested this; findings in his own words: Entering into the kernel debugger (I am assuming running crash on a live system here) does not result in the Status changing from "Booted" in PAN Manager. Only an NMI or actual crash dump will result in this state change. It is safe to trigger off "Booted(KDB)" for determination of blade status for fencing purposes. Hope this helps. Let me know if you need more information.
Sweet. Thanks :)
This is doable. Thanks for the clarification. Will need acks and all, but I will post a sample script to try here when I have it ready - which will be soon. Not a tough change.
When all this started I looked at the script myself and it looked pretty straightforward; there's a case statement whereby you're checking the status of the pBlade before executing the fence; one of your cases is already "if the pBlade is already down, return success and exit" -- when I looked at it, it seemed as if one could just add another entry to the same case statement, "if the blade has a status of Booted(KDB), return success and exit" There may be a better way to do it, but that seemed like the easiest. :) Note that this solution will only handle the situation of a user-initiated crash dump via an NMI. With this modification the server could still be fenced when it crashes of its own accord and starts dumping, since this does not change the status to Booted(KDB). Handling that case will be a little more tricky, but if you're already in there working on this, do you want to go two for the price of one and take that issue out too? It'd be a little tricker; you'd probably have to check a log file to determine if the pBlade is dumping.
Generally, the fence philosophy is to shoot first and ask questions later - for the sake of data...I can see how it would be useful, but I wonder if this would significantly delay a fence action? Usually, sysadmins are requesting fencing happen *faster*. Also, can arbitrary logs be grepped remotely through the management interface? Thanks
Switching to NEEDINFO based on Jim's question in comment #9
Hi, My apologies - I did not know you were waiting on input from me. I'm just saying it's possible -- do you want to address it in this go-around, or should we address in another ticket? Logs can be analyzed via the ssh connection into the cBlade. The log file in particular you'd be interested in is /opt/panmgr/bin/event.log. I would need to do a little research on which messages would be the correct ones to look for; it'd be along the lines of "Control Blade detects <lpan/pserver> is crashdumping to file" -- if you see that message for what you're trying to fence, then don't fence. But the question is, the message could appear in the file but be old -- how do we tell? Certainly would need to ensure clocks are reasonably accurate... But if I saw the message a half hour prior to the time of the attempted fence operation... Should I still fence him? What if he's still dumping? What about an hour before? etc...
Created attachment 152508 [details] updated fence_egenera script This patch implements code from the comments of Dan Polar, Egenera
Created attachment 152509 [details] updated fence_egenera script This updated script implements code from the comments of Dan Polar, Egenera
Have done some testing and this version of the script does the right thing if the server is dumping core either through an NMI or a system crash that would dump core. Tested this on Pan Manager 5.0.0.5 so YMMV on different revisions. Also, is this the "right" way to do it, as in the the fence script returns a success if the server is still dumping instead of after the box has completely gone down (or been power-cycled)? If we do set it up so that fencing fails until the server has actually gone down, is there an error code that the fencing script should send out?
As far as retuen code of the script goes, there is no intermediate value to be returned that says, 'please wait'. It says either success or failure. If we can detect dump state a the time of fencing, it seems likely that the thing is headed down; so I am OK with returning success. In regards to the 'Big Enchilada' mentioned above by Dan wherein we try and detect crash dumps initiated by the system and not the user - I fear corner cases there. There should be a separate ticket for that, please. I wish we could just detect a value in /proc or sysfs that would let us know dumpstate. Do you know of any? This would be faster probably and maybe even more reliable than checking to see if logs are growing/have a mod time within millis of current time/parsing... I intend to take the patch in the next update of rhel5 as well as rhel4. The customer should feel comfortable using the agent script they are using (the one they submitted as an attachment to this ticket). Is that OK? All parties happy for now? Lon?
Jim, I don't see any acks for this. Is it still on for 5.1/4.6?
This is all set for inclusion in 4.6
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0996.html