Bug 210682

Summary: fenced can get into state where it doesn't leave the cluster
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: David Teigland <teigland>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: ccaulfie, cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-12-12 17:56:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2006-10-13 19:00:35 UTC
Description of problem:
This may be related to 210641 or related to other cman/dlm daemon start stop
issues that have been filed.

I been playing around with starting and stoping cman in loops on a four node
cluster and I always seem to get into a state were fenced won't die on one or
more of the machines in the cluster.

[root@taft-01 ~]# service cman stop
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]

[root@taft-02 ~]# service cman stop
Stopping cluster:
   Stopping fencing... done
   Stopping cman... failed
/usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
                                                           [FAILED]

[root@taft-03 ~]# service cman stop
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]
[root@taft-04 ~]# service cman stop
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]


[root@taft-02 ~]# pidof fenced
7035
[root@taft-02 ~]# fence_tool leave
[root@taft-02 ~]# echo $?
0
[root@taft-02 ~]# pidof fenced
7035
[root@taft-02 ~]# cman_tool services
type             level name     id       state
fence            0     default  00010003 none
[2]


Version-Release number of selected component (if applicable):
[root@taft-02 ~]# uname -ar
Linux taft-02 2.6.18-1.2714.el5 #1 SMP Mon Oct 2 17:11:34 EDT 2006 x86_64 x86_64
x86_64 GNU/Linux


How reproducible:
Often

Comment 1 David Teigland 2006-10-17 18:13:39 UTC
Next time this happens collect the following from each node:

group_tool -v
group_dool dump
group_tool dump fence
anything from /var/log/messages


Comment 2 David Teigland 2006-12-12 17:56:58 UTC

*** This bug has been marked as a duplicate of 217449 ***

Comment 3 Nate Straz 2007-12-13 17:40:44 UTC
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.