Bug 589266 - qdisk should have a timer for heuristic "programs" instead of relying on "program" to provide timeout
Summary: qdisk should have a timer for heuristic "programs" instead of relying on "pro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-05 17:22 UTC by Shane Bradley
Modified: 2018-11-14 19:43 UTC (History)
3 users (show)

Fixed In Version: cman-2.0.115-55.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 636243 (view as bug list)
Environment:
Last Closed: 2011-01-13 22:33:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix (7.04 KB, patch)
2010-09-21 19:22 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0036 0 normal SHIPPED_LIVE cman bug-fix and enhancement update 2011-01-12 17:39:38 UTC

Description Shane Bradley 2010-05-05 17:22:46 UTC
Description of problem:

If a defined qdisk heuristic "program" does not have a defined timeout
the "program" will run till it completes. The drawback to this is that
the time to complete could be longer than (interval*tko) time.

An example of this would:
<heuristic interval="2" program="sleep 100" score="1"/>

What is needed is a timer around the calling function to terminate the
"program" as failed since it exceeded (interval*tko) time and declare
that node failed heuristic.

static int check_heuristic(struct h_data *h, int block) {
   ...
   ret = waitpid(h->childpid, &status, block?0:WNOHANG);
   ...
} 

Version-Release number of selected component (if applicable):
cman-2.0.115-34.el5

How reproducible:
Everytime

Steps to Reproduce:
1. Setup cluster with qdisk
2. Define a heuristic "program" that will exceed (interval*tko) time.
3. Start qiskd
  
Actual results:
The qdisk daemon does not noticed that heuristic "program" has ran for
too long and exceeded (interval*tko) time.

Expected results:
The qdisk daemon should noticed that heuristic "program" has ran for
too long and exceed (interval*tko) time.

Additional info:

Comment 1 Lon Hohberger 2010-09-21 19:22:11 UTC
Created attachment 448783 [details]
Fix

Comment 7 errata-xmlrpc 2011-01-13 22:33:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0036.html


Note You need to log in before you can comment on or make changes to this bug.