|Summary:||clvmd startup times out and fails, but continues running|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Christine Caulfield <ccaulfie>|
|Component:||kernel||Assignee:||Don Zickus <dzickus>|
|Status:||CLOSED ERRATA||QA Contact:||Cluster QE <mspqa-list>|
|Version:||5.1||CC:||agk, ccaulfie, dwysocha, jbrassow, mbroz, prockai, rmccabe, rpeterso, teigland|
|Fixed In Version:||RHBA-2007-0959||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2007-11-07 19:58:14 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:||238490|
Description Christine Caulfield 2007-08-07 16:23:02 UTC
+++ This bug was initially created as a clone of Bug #238490 +++ Description of problem: While doing recovery testing with revolver, sometimes clvmd will fail to start through the init scripts, but clvmd is still running after the test fails to find active volumes. Apr 30 12:05:31 morph-05 fenced: startup succeeded Apr 30 12:05:51 morph-05 clvmd: clvmd startup timed out Apr 30 12:05:51 morph-05 clvmd: clvmd startup failed ... Apr 30 12:05:56 morph-05 clvmd: Cluster LVM daemon started - connected to CMAN [root@morph-05 ~]# vgs VG #PV #LV #SN Attr VSize VFree morph-cluster 3 3 0 wz--nc 712.21G 0 [root@morph-05 ~]# lvs LV VG Attr LSize Origin Snap% Move Log Copy% morph-cluster0 morph-cluster -wi--- 237.40G morph-cluster1 morph-cluster -wi--- 237.40G morph-cluster2 morph-cluster -wi--- 237.40G [root@morph-05 ~]# ls /dev/mapper control VolGroup00-LogVol00 VolGroup00-LogVol01 [root@morph-05 ~]# ps -fC clvmd UID PID PPID C STIME TTY TIME CMD root 2824 1 0 12:05 ? 00:00:00 clvmd -T20 -t 90 Version-Release number of selected component (if applicable): lvm2-cluster-2.02.21-7.el4 How reproducible: once every 10 iterations. Steps to Reproduce: 1. form a cluster with GFS and run an I/O load 2. reboot -fin nodes randomly 3. when a node returns, try to remount the file systems Actual results: see above Expected results: a. clvmd should start up b. clvmd should not be running if it can't start up c. clvmd should log a useful error message when it can't start up. Additional info: -- Additional comment from firstname.lastname@example.org on 2007-05-01 04:10 EST -- It's more a question of how you want startup to work. Are you prepared to wait indefinitely for clvmd to get running or do you want it to timeout and let the startup continue? It's simply a question of setting the right timeout surely. OK, there are issues of why it takes this long to get going, but if the rest of the cluster is very busy then that's a lot of network and CPU traffic for the (lower priority) joining protocols to get through. -- Additional comment from email@example.com on 2007-05-01 10:00 EST -- I think there is more here than just picking out the right timeout values. We've never been able to get useful debug output or tracing from clvmd. It's a black box that often fails and never helps to answer why. Why is clvmd still running after it reported that it failed to start up? Why didn't it activate clustered volume groups after it did connect to the cluster? -- Additional comment from firstname.lastname@example.org on 2007-05-01 10:28 EST -- I don't see why it's any more of a black box than any other binary? Starting it with -d will add debugging output from it and stop it forking, that's no secret. What happens when you start clvmd is that the parent process forks the 'real' clvmd and waits for it to acknowledge that it has started up: connected to the cluster manager and created its lockspace (note, NOT activated logical volumes). If this takes too long, the parent process can time out but the daemon will carry on. If errors occur here then you will get output from clvmd saying what they were. If the parent process exits non-zero then the volumes will not be activated ... that's part of the init script and perfectly transparent. The reason clvmd starts up in this "odd" way is to try to provde MORE feedback to the console, and to allow some synchronisation so that no attempt to activate the volumes happens when clvmd has failed to start up. If the daemon just started up without then then it could fail and no-one would know (unless you inspected syslog) and the script would be littered with "sleep" and check commands to see if it had started up. And even so it would have the same effect as what's there now if you view the init script as a black box - it would either wait indefinitely or it would timeout. So what happpened here is that clvmd took a long time to connect to the cluster or create the lockspace so that the parent timed-out. Because the parent timed-out the script did not activate the logical volumes. clvmd itself continued to run and probably got going eventually. -- Additional comment from email@example.com on 2007-05-01 11:01 EST -- By making the debug mode not fork it is really hard to run the debug version in the same environment where we hit the bug. Whenever we have used the debug option the output hasn't been any more useful. Can we separate the debug and don't fork options? So if the parent process times out, why doesn't it kill the "real" process? It's counter-intuitive to me that it logs a failure yet continues to run. Can the debug option add more output while it is trying to connect to the cluster? Right now I don't know if clvmd is waiting for a join response or creating the lock space or whatever. -- Additional comment from firstname.lastname@example.org on 2007-05-01 11:17 EST -- Not forking when the debug flag is pretty standard daemon behaviour, all the other cluster suite daemons do it, as do a great many others. I suppose it could kill the parent process, but I'm not really in favour of killing a process that's doing useful work. If the script wants to do that then that's fair enough but I don't think it belongs in the binary. Adding more debugging output is fair enough - though to be honest I think you'll find that 99% of the time it's waiting for the lockspace to be created. -- Additional comment from email@example.com on 2007-05-02 04:25 EST -- I've added some more debug logging to clvmd at startup so you can see where it's waiting. It's in CVS but don't expect it to turn up in a package near you any time soon though! -- Additional comment from firstname.lastname@example.org on 2007-05-07 15:58 EST -- I'm running with a modified clvmd. The modified version always writes debug statements to /tmp/clvmd.log. The delay seems to be DLM initialisation. CLVMD[b7f318e0]: May 7 14:32:40 CMAN initialisation complete CLVMD[b7f318e0]: May 7 14:33:00 DLM initialisation complete I looked back through my failed run logs and every time startup fails, we're coming back from shooting three out of five nodes. So the node that fails clvmd startup is probably the first one to come up before the cluster regains quorum. I don't know what the startup scripts can do to handle this. Perhaps we'll have to put something in our tests to activate volume groups after recovery is verified. -- Additional comment from email@example.com on 2007-05-08 11:00 EST -- If this is quorum-related, is it worth making clvmd (perhaps optionally) ignore the timeout and wait for quorum instead? Another solution is to put something that /does/ wait for quorum ahead of clvmd in the startup sequence. -- Additional comment from firstname.lastname@example.org on 2007-05-08 15:32 EST -- Doesn't the cman startup script wait for quroum? I does seem odd to me that that could be the reason. Here is a larger chunk of /var/log/messages from a failing case. May 2 14:16:08 morph-05 kernel: CMAN: Waiting to join or form a Linux-cluster May 2 14:16:08 morph-05 ccsd: Connected to cluster infrastruture via: CMA N/SM Plugin v126.96.36.199 May 2 14:16:08 morph-05 ccsd: Initial status:: Inquorate May 2 14:16:10 morph-05 kernel: CMAN: sending membership request May 2 14:16:11 morph-05 kernel: CMAN: got node morph-04 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-02 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-01 May 2 14:16:11 morph-05 kernel: CMAN: got node morph-03 May 2 14:16:11 morph-05 kernel: CMAN: quorum regained, resuming activity May 2 14:16:11 morph-05 ccsd: Cluster is quorate. Allowing connections. May 2 14:16:11 morph-05 cman: startup succeeded May 2 14:16:11 morph-05 kernel: dm-cmirror: dm-cmirror 0.2.0 (built Apr 26 2007 16:33:46) installed May 2 14:16:11 morph-05 cmirror: startup succeeded May 2 14:16:11 morph-05 lock_gulmd: no <gulm> section detected in /etc/cluster/ cluster.conf succeeded May 2 14:16:26 morph-05 fenced: startup succeeded May 2 14:16:46 morph-05 clvmd: clvmd startup timed out May 2 14:16:46 morph-05 clvmd: clvmd startup failed Are there other quorums that we need to wait for? Dave, do you have any ideas? -- Additional comment from email@example.com on 2007-07-31 09:15 EST -- Created an attachment (id=160316) group_tool dump after the failure I got this problem revolver forced recovery. Additional info: [root@roth-03 ~]# group_tool -v type level name id state node id local_done fence 0 default 00010002 none [1 2 3] dlm 1 clvmd 00010003 none [1 2 3] [root@roth-03 ~]# ps ax -o pid,cmd,wchan | grep clvmd 2245 clvmd -T20 stext An strace of clvmd showed this: open("/dev/misc/dlm-control", O_RDWR) = 6 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 read(6, "\6\0\0\0\0\0\0\0\0\0\0\0", 12) = 12 write(6, "\6\0\0\0\0\0\0\0\0\0\0\0\4\1\0\0\0\0\0\0\0\0\0\0clvmd\0"..., 125 <unfinished ...> Which, according to Patrick, means create_lockspace didn't respond. The funny thing is that the clvmd lock space apparently already existed. -- Additional comment from firstname.lastname@example.org on 2007-07-31 09:18 EST -- Created an attachment (id=160317) sysrq-t at the time of failure. -- Additional comment from email@example.com on 2007-08-01 09:53 EST -- Created an attachment (id=160411) Patch to fix (I hope) This seems to be to do with the DLM 'othercon' thing in lowcomms. The don't get freed up (and pointers NULLed) as they should do. A previous patch addressed this to some extent but missed some important areas. Can you test this for me please bob? - it works on my systems. I've left some debugging in there in case it still misbehaves. -- Additional comment from firstname.lastname@example.org on 2007-08-02 09:41 EST -- The previously attached patch seems to work well. -- Additional comment from email@example.com on 2007-08-02 09:59 EST -- OK, I've sent the patch to Steve. -- Additional comment from firstname.lastname@example.org on 2007-08-03 09:10 EST -- Hmm, just spotted that all of Nate's comments refer to RHEL-4, whereas Bob's (and my patch) are for RHEL-5. -- Additional comment from email@example.com on 2007-08-07 12:15 EST -- Whatever, I think this patch needs to go into RHEL5.1. Can we have some acks ?
Comment 1 Christine Caulfield 2007-08-07 16:23:41 UTC
This is a clone to get ACKS for 5.1
Comment 2 Christine Caulfield 2007-08-13 09:18:12 UTC
Based on what I've seen in bz# 251562 there is a better (read simpler) fix available for this. It needs a little more testing on RHEL4 then porting into the RHEL5 lowcomms.
Comment 3 Dean Jansa 2007-08-13 14:42:38 UTC
This issue will likely cause GFS2 recovery testing to fail, which will not allow us to meet our GA release criteria. Marking blocker so we can get this fix into a build.
Comment 4 Don Zickus 2007-08-21 18:35:34 UTC
in 2.6.18-42.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 8 errata-xmlrpc 2007-11-07 19:58:14 UTC
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html