Description of problem: We've seen several instances with ClusterSuite where startup or shutdown hangs. With startup, one of the gfs mounts hung during the mount command. Strace on the mount command showed no activity in the process, so it was apparently blocked. During a stop call when relocating services, our clustered application failed to shutdown and hung. With the stop failure, it would be possible to add a timer to our stop action of the /etc/init.d script, but that seems like overkill and it's a pain to do some of the more advanced features in shell script. It would be desirable if clustersuite were to set a configurable timer and if the starting of the application takes too long, it would consider the start to have failed. On the stop action, it would be desirable to have a similar timer with the possibility of calling a second action if the timer is exceeded. stop, wait for 5 minutes and then call the abort action, for example. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Ok: start timeout => normal recovery (i.e., try to stop, then move somewhere else in the cluster) stop timeout => critical (i.e. if hardrecovery is set, reboot the node, else, mark service 'failed') status timeout => normal recovery
Upstream / community member would also like a status timeout
5.3 commit was: http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=32945a03a4721e73f153e7c02c4b04679b55ae18
Fixed in RHEL4. http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=c78d4a719b6264e4ff4165b6ef564de018c8418f
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1048.html