Bug 228101 - when a node is fenced it cannot rejoin the cluster
when a node is fenced it cannot rejoin the cluster
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openais (Show other bugs)
5.0
All Linux
high Severity high
: ---
: ---
Assigned To: Steven Dake
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-02-09 17:57 EST by Josef Bacik
Modified: 2016-04-26 09:48 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-14 13:58:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Josef Bacik 2007-02-09 17:57:14 EST
Description of problem:
In attempting to do GFS2 testing, I've found that if I start cman on one of my 
nodes and the other node hasn't started it yet, it will fence that node as 
expected.  The problem is that when the second node comes up it cannot join 
the cluster, and the node that is currently running just loops spitting out 
this in /var/log/messages

Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] Sending initial ORF token
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] CLM CONFIGURATION CHANGE
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] New Configuration:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ]      r(0) ip(10.10.1.13)
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Left:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] Members Joined:
Feb  9 17:54:55 rh5cluster1 openais[3839]: [SYNC ] This node is within the 
primary component and will provide service.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering OPERATIONAL state.
Feb  9 17:54:55 rh5cluster1 openais[3839]: [CLM  ] got nodejoin message 
10.10.1.13
Feb  9 17:54:55 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
11.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering GATHER state from 
0.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Creating commit token 
because I am the rep.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Saving state aru 9 high seq 
received 9
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering COMMIT state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] entering RECOVERY state.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] position [0] member 
10.10.1.13:
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] previous ring seq 160 rep 
10.10.1.13
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] aru 9 high delivered 8 
received flag 0
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Did not need to originate 
any messages in recovery.
Feb  9 17:55:00 rh5cluster1 openais[3839]: [TOTEM] Storing new sequence id for 
ring a4

I will look into this more next week, but I'm still in the process of reading 
the openais code so I'm not in a position to intelligently troubleshoot this 
yet.

Version-Release number of selected component (if applicable):

[root@rh5cluster2 ~]# rpm -q openais
openais-0.80.2-1.el5

How reproducible:
Every time

Steps to Reproduce:
1.bring both nodes up without starting cman
2.start cman on one node and let it fence the other node
  
Actual results:
The fenced node isn't allowed to join the cluster and the node that is 
currently up just loops.

Expected results:
It should let the node join.
Comment 1 Josef Bacik 2007-02-14 13:58:37 EST
ok i'm an idiot, i had iptables turned on on the second node.  closing this.

Note You need to log in before you can comment on or make changes to this bug.