Bug 144826 - fence_tool leave: not leaving fence domain, returning success
Summary: fence_tool leave: not leaving fence domain, returning success
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: fence
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-11 19:49 UTC by Derek Anderson
Modified: 2009-04-16 20:30 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-02-09 20:58:33 UTC
Embargoed:


Attachments (Terms of Use)

Description Derek Anderson 2005-01-11 19:49:34 UTC
Description of problem:
This started happening with today's RPMs.  On each node in the cluster
run 'ccsd; modprobe cman; cman_tool join; fence_tool join'.  Then
attempt to leave the fence domain with 'fence_tool leave'.  The
command returns 0 but the node does not leave the fence domain.  Have
to 'kill -9 <fenced_pid> to continue shutting down the cluster.

[root@link-12 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

[root@link-12 root]# fence_tool leave
[root@link-12 root]# echo $?
0
[root@link-12 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

Version-Release number of selected component (if applicable):
[root@link-12 root]# fenced -V
fenced 1.7. (built Jan 10 2005 16:22:11)

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Dean Jansa 2005-01-11 19:51:13 UTC
FWIW -- I see the same thing on my 6 node cluster...

Comment 2 David Teigland 2005-01-14 07:53:26 UTC
This is related to an old bz where we said that fence_tool join/leave
are asynchronous commands.  They return success if they are able to
initiate the join/leave, but don't wait around to see what happens.

Unfortunately, there's no good way of making the join or the leave
synchronous without a bit of work.  One quick hack is to make
fence_tool watch /proc/cluster/services to determine when the join
or leave is complete, but at that point we have "service fenced stop".


Comment 3 Dean Jansa 2005-01-14 15:27:31 UTC
Dave --

I'm not sure this is what we are seeing.  I realize you exit without
waiting around, but the nodes never leave the fence domain.  So the
issue is we can't leave, not that we don't get an error.  (Although
that is an issue as well, but a different one)

If I issue a fence_tool leave on a single node, and wait for, oh 15
minutes -- I'm still in the fence domain.  No messages in the logs,
nothing on the console to indicate something is wrong.

/proc/cluste/services looks like:

Fence Domain:    "default"                           1   2 run       -
[1 5 3 4 2]

So that is the crux of this bug.

Comment 5 David Teigland 2005-01-18 11:27:59 UTC
Sorry about that, didn't look closely enough.

I believe you're seeing a bug I created on Jan 10 while fixing another
fenced bug.  I fixed it the next day, Jan 11, but not quickly enough
for it to get into the rpm builds.


Comment 6 Derek Anderson 2005-02-09 20:58:33 UTC
Fix verified, fence-1.15-7.


Note You need to log in before you can comment on or make changes to this bug.