Bug 168698 - fenced init should check if cman init actually joined successfully
fenced init should check if cman init actually joined successfully
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: fence (Show other bugs)
4
All Linux
high Severity medium
: ---
: ---
Assigned To: Jim Parsons
Cluster QE
:
Depends On:
Blocks: 164914
  Show dependency treegraph
 
Reported: 2005-09-19 12:50 EDT by Corey Marthaler
Modified: 2009-04-16 15:51 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHBA-2006-0242
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-09 14:53:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2005-09-19 12:50:09 EDT
Description of problem:
This may be related to 163185. If the cman init script fails to start/join for
whatever reason, then there really is no reason to even run the fenced init. The
fenced init should either check if cman was able to join the cluster or there
should be a timeout option to fence_tool and then fail like the cman init does.
Otherwise the fenced init startup script will hang forever.

Version-Release number of selected component (if applicable):
fence_tool 1.32.6

How reproducible:
everytime, when cman init fails
Comment 1 Corey Marthaler 2005-10-18 12:32:50 EDT
Here's how to reproduce this hang, run the cman init script on one node (or only
enough nodes for quorum *not* to occur). With in the cman init, the cman_tool
join will "pass", but the wait for quorom will eventually timeout. Then run the
fenced init script. It will hang since fence_tool doesn't have a "-t" wait flag. 
 
[root@link-08 ~]# service ccsd start
Starting ccsd:                                             [  OK  ]
[root@link-08 ~]# service cman start
Starting cman:                                             [FAILED]
[root@link-08 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   1    1    3   M   link-08
[root@link-08 ~]# service fenced start
Starting fence domain: 
[HANG]

The fix for this is to add the "-t" to fence_tool and then both tools will
behave the same way. 
Comment 2 Chris Feist 2005-10-18 13:03:43 EDT
I agree with Mr. Marthaler.  If we don't add a '-t' option to fence_tool then
we'll always have the potential of hanging in the init script.  Reassigning to
the fence maintainer.
Comment 3 Corey Marthaler 2005-11-03 16:30:12 EST
FYI: this issue can cause the 'service stop' to also hang indefinately without a
timeout.
Comment 4 Jim Parsons 2005-12-06 16:01:47 EST
This is fixed in U3 errata build
Comment 5 Corey Marthaler 2005-12-15 17:51:51 EST
The new timeout option isn't working properly with fenced which breaks its init
script.

init.d output (with debugging):
Starting fence domain:fence_tool: wait for quorum 1
fence_tool: get our node name
fence_tool: connect to ccs
fence_tool: start fenced
fenced: invalid option -- t
Please use '-h' for usage.
[FAILED]

# by hand:
[root@taft-03 ~]# fence_tool -c 200 join
Segmentation fault
[root@taft-03 ~]# fence_tool -t 100
fence_tool: no operation specified

[root@taft-03 ~]# fence_tool join -t 100
fenced: invalid option -- t
Please use '-h' for usage.

Comment 6 Jim Parsons 2005-12-20 13:18:47 EST
Fixed in U3 tree
Comment 7 Corey Marthaler 2006-01-24 16:31:14 EST
The init script will still hang with the latest build. I tried the fence_tool
cmdline in the initscript, and it never returns, thus causing the hang.

[root@link-08 lvm]# fence_tool -t 10 join -w
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
fence_tool: waiting for cluster quorum
[continues]

[root@link-08 lvm]# rpm -q fence
fence-1.32.13-0
Comment 8 Jim Parsons 2006-02-09 10:18:48 EST
New fix checked in - thank you Lon.
Comment 9 Corey Marthaler 2006-02-16 15:38:08 EST
The 'service fenced start' fix is verified, however the 'service fenced stop'
case in comment #3 still exists. I'll file a new bug for that so that we can
finally get this bug closed. 
Comment 11 Red Hat Bugzilla 2006-03-09 14:53:14 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0242.html

Note You need to log in before you can comment on or make changes to this bug.