Bug 144826 - fence_tool leave: not leaving fence domain, returning success
fence_tool leave: not leaving fence domain, returning success
Status: CLOSED NEXTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: fence (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-11 14:49 EST by Derek Anderson
Modified: 2009-04-16 16:30 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-09 15:58:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Derek Anderson 2005-01-11 14:49:34 EST
Description of problem:
This started happening with today's RPMs.  On each node in the cluster
run 'ccsd; modprobe cman; cman_tool join; fence_tool join'.  Then
attempt to leave the fence domain with 'fence_tool leave'.  The
command returns 0 but the node does not leave the fence domain.  Have
to 'kill -9 <fenced_pid> to continue shutting down the cluster.

[root@link-12 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

[root@link-12 root]# fence_tool leave
[root@link-12 root]# echo $?
0
[root@link-12 root]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1 2 3]

Version-Release number of selected component (if applicable):
[root@link-12 root]# fenced -V
fenced 1.7. (built Jan 10 2005 16:22:11)

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Dean Jansa 2005-01-11 14:51:13 EST
FWIW -- I see the same thing on my 6 node cluster...
Comment 2 David Teigland 2005-01-14 02:53:26 EST
This is related to an old bz where we said that fence_tool join/leave
are asynchronous commands.  They return success if they are able to
initiate the join/leave, but don't wait around to see what happens.

Unfortunately, there's no good way of making the join or the leave
synchronous without a bit of work.  One quick hack is to make
fence_tool watch /proc/cluster/services to determine when the join
or leave is complete, but at that point we have "service fenced stop".
Comment 3 Dean Jansa 2005-01-14 10:27:31 EST
Dave --

I'm not sure this is what we are seeing.  I realize you exit without
waiting around, but the nodes never leave the fence domain.  So the
issue is we can't leave, not that we don't get an error.  (Although
that is an issue as well, but a different one)

If I issue a fence_tool leave on a single node, and wait for, oh 15
minutes -- I'm still in the fence domain.  No messages in the logs,
nothing on the console to indicate something is wrong.

/proc/cluste/services looks like:

Fence Domain:    "default"                           1   2 run       -
[1 5 3 4 2]

So that is the crux of this bug.
Comment 5 David Teigland 2005-01-18 06:27:59 EST
Sorry about that, didn't look closely enough.

I believe you're seeing a bug I created on Jan 10 while fixing another
fenced bug.  I fixed it the next day, Jan 11, but not quickly enough
for it to get into the rpm builds.
Comment 6 Derek Anderson 2005-02-09 15:58:33 EST
Fix verified, fence-1.15-7.

Note You need to log in before you can comment on or make changes to this bug.