Bug 238490
Summary: | clvmd startup times out and fails, but continues running | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Nate Straz <nstraz> | ||||||||||||||
Component: | dlm-kernel | Assignee: | Christine Caulfield <ccaulfie> | ||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | 4 | CC: | agk, ccaulfie, cluster-maint, djansa, dwysocha, jbrassow, mbroz, prockai, rpeterso, teigland | ||||||||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | All | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | RHBA-2007-0995 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2007-11-21 21:55:42 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 251179, 267641 | ||||||||||||||||
Attachments: |
|
Description
Nate Straz
2007-04-30 20:48:18 UTC
It's more a question of how you want startup to work. Are you prepared to wait indefinitely for clvmd to get running or do you want it to timeout and let the startup continue? It's simply a question of setting the right timeout surely. OK, there are issues of why it takes this long to get going, but if the rest of the cluster is very busy then that's a lot of network and CPU traffic for the (lower priority) joining protocols to get through. I think there is more here than just picking out the right timeout values. We've never been able to get useful debug output or tracing from clvmd. It's a black box that often fails and never helps to answer why. Why is clvmd still running after it reported that it failed to start up? Why didn't it activate clustered volume groups after it did connect to the cluster? I don't see why it's any more of a black box than any other binary? Starting it with -d will add debugging output from it and stop it forking, that's no secret. What happens when you start clvmd is that the parent process forks the 'real' clvmd and waits for it to acknowledge that it has started up: connected to the cluster manager and created its lockspace (note, NOT activated logical volumes). If this takes too long, the parent process can time out but the daemon will carry on. If errors occur here then you will get output from clvmd saying what they were. If the parent process exits non-zero then the volumes will not be activated ... that's part of the init script and perfectly transparent. The reason clvmd starts up in this "odd" way is to try to provde MORE feedback to the console, and to allow some synchronisation so that no attempt to activate the volumes happens when clvmd has failed to start up. If the daemon just started up without then then it could fail and no-one would know (unless you inspected syslog) and the script would be littered with "sleep" and check commands to see if it had started up. And even so it would have the same effect as what's there now if you view the init script as a black box - it would either wait indefinitely or it would timeout. So what happpened here is that clvmd took a long time to connect to the cluster or create the lockspace so that the parent timed-out. Because the parent timed-out the script did not activate the logical volumes. clvmd itself continued to run and probably got going eventually. By making the debug mode not fork it is really hard to run the debug version in the same environment where we hit the bug. Whenever we have used the debug option the output hasn't been any more useful. Can we separate the debug and don't fork options? So if the parent process times out, why doesn't it kill the "real" process? It's counter-intuitive to me that it logs a failure yet continues to run. Can the debug option add more output while it is trying to connect to the cluster? Right now I don't know if clvmd is waiting for a join response or creating the lock space or whatever. Not forking when the debug flag is pretty standard daemon behaviour, all the other cluster suite daemons do it, as do a great many others. I suppose it could kill the parent process, but I'm not really in favour of killing a process that's doing useful work. If the script wants to do that then that's fair enough but I don't think it belongs in the binary. Adding more debugging output is fair enough - though to be honest I think you'll find that 99% of the time it's waiting for the lockspace to be created. I've added some more debug logging to clvmd at startup so you can see where it's waiting. It's in CVS but don't expect it to turn up in a package near you any time soon though! I'm running with a modified clvmd. The modified version always writes debug statements to /tmp/clvmd.log. The delay seems to be DLM initialisation. CLVMD[b7f318e0]: May 7 14:32:40 CMAN initialisation complete CLVMD[b7f318e0]: May 7 14:33:00 DLM initialisation complete I looked back through my failed run logs and every time startup fails, we're coming back from shooting three out of five nodes. So the node that fails clvmd startup is probably the first one to come up before the cluster regains quorum. I don't know what the startup scripts can do to handle this. Perhaps we'll have to put something in our tests to activate volume groups after recovery is verified. If this is quorum-related, is it worth making clvmd (perhaps optionally) ignore the timeout and wait for quorum instead? Another solution is to put something that /does/ wait for quorum ahead of clvmd in the startup sequence. Doesn't the cman startup script wait for quroum? I does seem odd to me that that could be the reason. Here is a larger chunk of /var/log/messages from a failing case. May 2 14:16:08 morph-05 kernel: CMAN: Waiting to join or form a Linux-cluster May 2 14:16:08 morph-05 ccsd[2713]: Connected to cluster infrastruture via: CMA N/SM Plugin v1.1.7.4 May 2 14:16:08 morph-05 ccsd[2713]: Initial status:: Inquorate May 2 14:16:10 morph-05 kernel: CMAN: sending membership request May 2 14:16:11 morph-05 kernel: CMAN: got node morph-04 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-02 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-01 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-03 May 2 14:16:11 morph-05 kernel: CMAN: quorum regained, resuming activity May 2 14:16:11 morph-05 ccsd[2713]: Cluster is quorate. Allowing connections. May 2 14:16:11 morph-05 cman: startup succeeded May 2 14:16:11 morph-05 kernel: dm-cmirror: dm-cmirror 0.2.0 (built Apr 26 2007 16:33:46) installed May 2 14:16:11 morph-05 cmirror: startup succeeded May 2 14:16:11 morph-05 lock_gulmd: no <gulm> section detected in /etc/cluster/ cluster.conf succeeded May 2 14:16:26 morph-05 fenced: startup succeeded May 2 14:16:46 morph-05 clvmd: clvmd startup timed out May 2 14:16:46 morph-05 clvmd: clvmd startup failed Are there other quorums that we need to wait for? Dave, do you have any ideas? Created attachment 160316 [details]
group_tool dump after the failure
I got this problem revolver forced recovery. Additional info:
[root@roth-03 ~]# group_tool -v
type level name id state node id local_done
fence 0 default 00010002 none
[1 2 3]
dlm 1 clvmd 00010003 none
[1 2 3]
[root@roth-03 ~]# ps ax -o pid,cmd,wchan | grep clvmd
2245 clvmd -T20 stext
An strace of clvmd showed this:
open("/dev/misc/dlm-control", O_RDWR) = 6
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
read(6, "\6\0\0\0\0\0\0\0\0\0\0\0", 12) = 12
write(6, "\6\0\0\0\0\0\0\0\0\0\0\0\4\1\0\0\0\0\0\0\0\0\0\0clvmd\0"..., 125
<unfinished ...>
Which, according to Patrick, means create_lockspace didn't respond.
The funny thing is that the clvmd lock space apparently already existed.
Created attachment 160317 [details]
sysrq-t at the time of failure.
Created attachment 160411 [details]
Patch to fix (I hope)
This seems to be to do with the DLM 'othercon' thing in lowcomms. The don't get
freed up (and pointers NULLed) as they should do. A previous patch addressed
this to some extent but missed some important areas.
Can you test this for me please bob? - it works on my systems. I've left some
debugging in there in case it still misbehaves.
The previously attached patch seems to work well. OK, I've sent the patch to Steve. Hmm, just spotted that all of Nate's comments refer to RHEL-4, whereas Bob's (and my patch) are for RHEL-5. Whatever, I think this patch needs to go into RHEL5.1. Can we have some acks ? Bug #251179 cloned for 5.1 Created attachment 161164 [details] New Patch This is a much neater patch, that also addresses a problem with the earlier patch that caused bz#251562 Would you test this for me please Bob? Created attachment 161165 [details]
RHEL-5 patch
Here's a patch that applies to a RHEL-5 kernel.
Created attachment 161176 [details]
RHEL-4 patch
Just to complete the set here's the RHEL-4 patch :-)
*** Bug 251562 has been marked as a duplicate of this bug. *** Setting flags and also requesting for 4.5.z release - need to respin dlm-kernel for the errata release. on RHEL4 Checking in lowcomms.c; /cvs/cluster/cluster/dlm-kernel/src/Attic/lowcomms.c,v <-- lowcomms.c new revision: 1.22.2.17; previous revision: 1.22.2.16 done and on RHEL45: Checking in lowcomms.c; /cvs/cluster/cluster/dlm-kernel/src/Attic/lowcomms.c,v <-- lowcomms.c new revision: 1.22.2.13.2.3; previous revision: 1.22.2.13.2.2 done An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0995.html |