333161 – Feature Request: timeout on start/stop actions

Bug 333161 - Feature Request: timeout on start/stop actions

Summary: Feature Request: timeout on start/stop actions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Ryan O'Hara
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-15 21:39 UTC by Anthony Green
Modified:	2009-05-18 21:13 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-18 21:13:25 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1048	0	normal	SHIPPED_LIVE	rgmanager bug-fix and enhancement update	2009-05-18 21:12:29 UTC

Description Anthony Green 2007-10-15 21:39:31 UTC

Description of problem:
We've seen several instances with ClusterSuite where startup or shutdown hangs.
 With startup, one of the gfs mounts hung during the mount command.  Strace on
the mount command showed no activity in the process, so it was apparently
blocked.  During a stop call when relocating services, our clustered application
failed to shutdown and hung.

With the stop failure, it would be possible to add a timer to our stop action of
the /etc/init.d script, but that seems like overkill and it's a pain to do some
of the more advanced features in shell script.  It would be desirable if
clustersuite were to set a configurable timer and if the starting of the
application takes too long, it would consider the start to have failed.  On the
stop action, it would be desirable to have a similar timer with the possibility
of calling a second action if the timer is exceeded.  stop, wait for 5 minutes
and then call the abort action, for example. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Lon Hohberger 2007-10-23 20:48:17 UTC

Ok:

start timeout => normal recovery (i.e., try to stop, then move somewhere else in
the cluster)

stop timeout => critical (i.e. if hardrecovery is set, reboot the node, else,
mark service 'failed')

status timeout => normal recovery

Comment 5 Lon Hohberger 2008-08-18 21:03:50 UTC

Upstream / community member would also like a status timeout

Comment 8 Lon Hohberger 2009-01-23 16:23:45 UTC

5.3 commit was:

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=32945a03a4721e73f153e7c02c4b04679b55ae18

Comment 9 Ryan O'Hara 2009-01-23 22:03:03 UTC

Fixed in RHEL4.

http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=c78d4a719b6264e4ff4165b6ef564de018c8418f

Comment 13 errata-xmlrpc 2009-05-18 21:13:25 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1048.html

Note You need to log in before you can comment on or make changes to this bug.