Bug 228101

Summary: when a node is fenced it cannot rejoin the cluster
Product: Red Hat Enterprise Linux 5 Reporter: Josef Bacik <jbacik>
Component: openaisAssignee: Steven Dake <sdake>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-14 18:58:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josef Bacik 2007-02-09 22:57:14 UTC
Description of problem:
In attempting to do GFS2 testing, I've found that if I start cman on one of my 
nodes and the other node hasn't started it yet, it will fence that node as 
expected.  The problem is that when the second node comes up it cannot join 
the cluster, and the node that is currently running just loops spitting out 
this in /var/log/messages

Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] Sending initial ORF token
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering OPERATIONAL state.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] got nodejoin message 
10.10.1.13
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
11.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
0.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Creating commit token 
because I am the rep.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Saving state aru 9 high seq 
received 9
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering COMMIT state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering RECOVERY state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] position [0] member 
10.10.1.13:
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] previous ring seq 160 rep 
10.10.1.13
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] aru 9 high delivered 8 
received flag 0
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Did not need to originate 
any messages in recovery.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Storing new sequence id for 
ring a4

I will look into this more next week, but I'm still in the process of reading 
the openais code so I'm not in a position to intelligently troubleshoot this 
yet.

Version-Release number of selected component (if applicable):

[root@rh5cluster2 ~]# rpm -q openais
openais-0.80.2-1.el5

How reproducible:
Every time

Steps to Reproduce:
1.bring both nodes up without starting cman
2.start cman on one node and let it fence the other node
  
Actual results:
The fenced node isn't allowed to join the cluster and the node that is 
currently up just loops.

Expected results:
It should let the node join.

Comment 1 Josef Bacik 2007-02-14 18:58:37 UTC
ok i'm an idiot, i had iptables turned on on the second node.  closing this.