Bug 589266 - qdisk should have a timer for heuristic "programs" instead of relying on "program" to provide timeout
qdisk should have a timer for heuristic "programs" instead of relying on "pro...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.5
All Linux
low Severity medium
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-05 13:22 EDT by Shane Bradley
Modified: 2016-04-26 10:33 EDT (History)
3 users (show)

See Also:
Fixed In Version: cman-2.0.115-55.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 636243 (view as bug list)
Environment:
Last Closed: 2011-01-13 17:33:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix (7.04 KB, patch)
2010-09-21 15:22 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Shane Bradley 2010-05-05 13:22:46 EDT
Description of problem:

If a defined qdisk heuristic "program" does not have a defined timeout
the "program" will run till it completes. The drawback to this is that
the time to complete could be longer than (interval*tko) time.

An example of this would:
<heuristic interval="2" program="sleep 100" score="1"/>

What is needed is a timer around the calling function to terminate the
"program" as failed since it exceeded (interval*tko) time and declare
that node failed heuristic.

static int check_heuristic(struct h_data *h, int block) {
   ...
   ret = waitpid(h->childpid, &status, block?0:WNOHANG);
   ...
} 

Version-Release number of selected component (if applicable):
cman-2.0.115-34.el5

How reproducible:
Everytime

Steps to Reproduce:
1. Setup cluster with qdisk
2. Define a heuristic "program" that will exceed (interval*tko) time.
3. Start qiskd
  
Actual results:
The qdisk daemon does not noticed that heuristic "program" has ran for
too long and exceeded (interval*tko) time.

Expected results:
The qdisk daemon should noticed that heuristic "program" has ran for
too long and exceed (interval*tko) time.

Additional info:
Comment 1 Lon Hohberger 2010-09-21 15:22:11 EDT
Created attachment 448783 [details]
Fix
Comment 7 errata-xmlrpc 2011-01-13 17:33:49 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0036.html

Note You need to log in before you can comment on or make changes to this bug.