Bug 461948
| Summary: | Provide a method to wait for kdump to complete from fencing | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Lon Hohberger <lhh> | ||||||
| Component: | fence-agents | Assignee: | Ryan O'Hara <rohara> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.2 | CC: | ajb2, casmith, ccaulfie, cluster-maint, crenshaw, djansa, djuran, fdinitto, grimme, hklein, hlawatschek, kskmori, lhh, michael.hagmann, mjuricek, nhorman, qcai, rick.beldin, rohara, rpeterso, samuel.kielek, syeghiay, tao, teigland, vgoyal | ||||||
| Target Milestone: | rc | Keywords: | FutureFeature | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | fence-agents-3.1.5-5.el6 | Doc Type: | Enhancement | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 918795 (view as bug list) | Environment: | |||||||
| Last Closed: | 2011-12-06 12:22:19 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 309991, 585266, 585332, 918795, 1081175 | ||||||||
| Attachments: |
|
||||||||
|
Description
Lon Hohberger
2008-09-11 16:14:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. This is a duplicate of bug 309991, I don't know which to close. Same suggestion as the RHEL5 equivalent, bz 309991 -- use SAN fencing. And if someone wants a form of automatic reboot, call "reboot" at the end of the kdump routine, or set up a watchdog to monitor the kdump and reboot when it's done. Modifying Lon's original description a bit based on irc discussion: (1) kdump side kdump environment is required to have all the same networks configured in the same way as the normal cluster environment kdump environment starts a new daemon fence_dump_ack which broadcasts DONTTAZEMEBRO packets on all networks every N seconds, as long as kdump continues to run properly. fence_dump_ack needs to not be starved by the heavy disk i/o I don't know what the complicating factors would be on this side. (2) Cluster (fence) side new fence agent fence_dump_check listens for broadcast messages on the cluster network fence_dump_check upon receiving a dump_ack message, compares the source ip to the ip of the victim node name, if it matches it returns 0 (success) and the fencing is complete fence_dump_check waits for up to M seconds to receive a matching dump_ack, if it doesn't it returns 1 (failure), the fence operation fails, and fenced moves on to the next configured fence agent. Honestly, this doesn't seem as complicated as I feared it might be, although there could be some nasty situations that come up in implementation and testing. One complicating factor will be non-crash fencing situations, where the failed/victim node is not kdumping, e.g. startup fencing, fencing due to network disruption or partition. In each of these cases we'll be running fence_dump_check for M seconds for each victim that's not actually dumping before getting to the "real" fence agent. SAN fencing still seems like a better solution in general, but if that's not available, this may work passably. Questions for the kdump experts here... 1. How do we ensure that kdump network environment mirrors the real environment? 2. What can/can't we run as part of the kdump pre script? Will running a daemon that sends out broadcast packets on all interfaces work or will that be problematic? Question for dct: Your described implementation of fence_dump_ack means that if kdump is successful the machine is never rebooted via a power fence. Would it be useful for fence_dump_ack to return success so that the cluster can continue but also spawn a separate thread/process that waits for the final broadcast message from the dumping node so that it can call the secondary fence agent that will actually power cycle the node? Sorry I've never used kdump, but doesn't it reboot the machine when the dump completes? Spawning something on the fencing node to monitor the remote kdump and power cycle the machine when it finishes sounds really ugly. The machine doing the kdump should really be responsible for rebooting itself one way or another, even if that means using a watchdog for the worst situations. Agreed. Assuming kdump reboots the machine then we don't need what I described above. Just the simple solution you outlined. If you want fencing to reboot the node after kdump completes, you can simply have two different sorts of packets which fence_dump_check can wait for: - WAIT - sent periodically when kdump is running - DONE - sent after kdump completes If fence_dump_check is not configured to honor WAIT packets, then fence_dump_check will exit immediately (success) after first WAIT packet is received. At that point, the node will have entered the kdump environment, replacing the old kernel. If fence_dump_check is configured to honor 'WAIT' packets (e.g. fence_dump_check wait="1" or whatever), then WAIT can extend wait time. We don't return until (a) timeout or (b) DONE is received. If users decide to use kdump in this way, other fencing agents at the same level may be used to reboot the node after the DONE packet is received without relying on further modifications to or requirements on the kdump environment. Also, we could have a meta attribute passed in to fencing agents generated by fenced - for example, the number of remaining fencing agents for this method/level - making the requirement to configure a special attribute for fence_dump_check obsolete. E.g. if meta_remaining > 0, then honor WAIT packets, else don't. If we really need to insert a reboot into the process, just have fence_dump_ack call reboot when it sees the dump finish... but I still think kdump itself may prefer to be in charge of that. (BTW, I've been thinking for a while that fence_dump_ack is looking an aweful lot like a watchdog monitor. I wonder if we could just configure the watchdog program to: monitor the kdump, reboot if the dump stalls, reboot after the dump completes, periodically call fence_dump_ack which would just send the broadcast.) This feature would require substantial work if it was to be done. *** Bug 309991 has been marked as a duplicate of this bug. *** Andrew pointed me at this: http://www.gossamer-threads.com/lists/linuxha/dev/51968 It's not the same design we had in mind, but it's a lot less code to write. This one creates a special user in the kdump environment and copies in a SSH key. The cluster then uses ssh to connect to the kdump environment and check whether the host is dumping or not. There are some issues with the implementation: 1) the stonith module appears to wait forever if it fails to connect 2) the stonith module is a stonith module and not usable by fenced at this point I have not checked whether the patch to mkdumprd sill works; the patch was submitted to linux-ha-dev in 2008 shortly after the cluster summit in Prague. Thats not a bad idea, although I'm little hesitant to start an sshd server while we're kdumping. Nominally the kdump kernel/initrd is going to be operating in a very restrictive memory environment (typically 128Mb). If we're doing dump filtering (which is memory intentisve), and we need to service ssh operations in parallel, we might be looking at out of memory conditions, or at least some failed allocations. It would be better if the dumping system could initiate an action to prevent it from being fenced. That way we can serialize the fence prevention and dump capture operations, and save on memory use That, said, it won;t hurt to test this patch out and see how it goes. Notes from Lon in a discussion he had with Neil/Subhendu: Issues on the ssh implementation that Andrew noted. It is very sensitive to: - ssh key synchronization - UID changes - key changes Also, adding sshd greatly increases: - memory & dump image footprints Now, turns out the model may be preferable - that is, have the cluster connect to the dumping machine rather than wait for a special packet. Specifically, we can likely simply use nc to implement a simple server on a predefined IP port. This will save a lot on memory and on-disk footprint for dump ramdisk because nc is a built-in for busybox. No body to work on this, kick down the road. Pushed new fence_kdump agent to fence-agents repo, both master and RHEL6 branch. Created attachment 516371 [details]
Using and testing the fence_kdump agent
Here is a write-up about how to use/test the new fence_kdump agent. I've described how you can test fence_kdump and fence_kdump_send independently as well as how to test it in a cluster with the kdump service enabled. Please send questions and comments.
Created attachment 516744 [details]
Using and testing the fence_kdump agent
Updated the usage/testing write-up to describe how to change the behavior of fence_kdump_send.
This would be handy on RHEL5 too. Dave, the issue with fencing (power fencing) when a crashed node is running kdump is that half the time the node is rebooted before vmcore is completed. Kdump supposedly reboots the machine, but that doesn't always happen... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1599.html |