589266 – qdisk should have a timer for heuristic "programs" instead of relying on "program" to provide timeout

Bug 589266 - qdisk should have a timer for heuristic "programs" instead of relying on "program" to provide timeout

Summary: qdisk should have a timer for heuristic "programs" instead of relying on "pro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	cman
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-05-05 17:22 UTC by Shane Bradley
Modified:	2018-11-14 19:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:	cman-2.0.115-55.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	636243 (view as bug list)
Environment:
Last Closed:	2011-01-13 22:33:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Fix (7.04 KB, patch) 2010-09-21 19:22 UTC, Lon Hohberger	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0036	0	normal	SHIPPED_LIVE	cman bug-fix and enhancement update	2011-01-12 17:39:38 UTC

Description Shane Bradley 2010-05-05 17:22:46 UTC

Description of problem:

If a defined qdisk heuristic "program" does not have a defined timeout
the "program" will run till it completes. The drawback to this is that
the time to complete could be longer than (interval*tko) time.

An example of this would:
<heuristic interval="2" program="sleep 100" score="1"/>

What is needed is a timer around the calling function to terminate the
"program" as failed since it exceeded (interval*tko) time and declare
that node failed heuristic.

static int check_heuristic(struct h_data *h, int block) {
   ...
   ret = waitpid(h->childpid, &status, block?0:WNOHANG);
   ...
} 

Version-Release number of selected component (if applicable):
cman-2.0.115-34.el5

How reproducible:
Everytime

Steps to Reproduce:
1. Setup cluster with qdisk
2. Define a heuristic "program" that will exceed (interval*tko) time.
3. Start qiskd
  
Actual results:
The qdisk daemon does not noticed that heuristic "program" has ran for
too long and exceeded (interval*tko) time.

Expected results:
The qdisk daemon should noticed that heuristic "program" has ran for
too long and exceed (interval*tko) time.

Additional info:

Comment 1 Lon Hohberger 2010-09-21 19:22:11 UTC

Created attachment 448783 [details]
Fix

Comment 3 Lon Hohberger 2010-09-24 22:26:45 UTC

http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=f2bfc93101e06cba918c2bb0c11ab6d668788019

Comment 7 errata-xmlrpc 2011-01-13 22:33:49 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0036.html

Note You need to log in before you can comment on or make changes to this bug.