214290 – send plock error -1 in gfs_controld logs

Bug 214290 - send plock error -1 in gfs_controld logs

Summary: send plock error -1 in gfs_controld logs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	openais
Sub Component:
Version:	5.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Steven Dake
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-11-06 22:10 UTC by Abhijith Das
Modified:	2016-04-26 13:47 UTC (History)
CC List:	5 users (show)
Fixed In Version:	5.0.0
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-11-28 21:35:03 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
group_tool dump on all nodes (15.41 KB, application/x-gzip) 2006-11-06 22:22 UTC, Abhijith Das	no flags	Details
/var/log/messages on the smoke cluster nodes: (304.59 KB, application/x-gzip) 2006-11-06 22:23 UTC, Abhijith Das	no flags	Details
View All

Description Abhijith Das 2006-11-06 22:10:43 UTC

Description of problem:
While running revolver on the smoke cluster with IO load on 3 GFS filesystems,
one of the nodes (merit) logged the following: 

Nov  6 15:27:47 merit gfs_controld[1774]: cpg_mcast_joined error 2 handle 6b8b
456700000000 MSG_PLOCK
Nov  6 15:27:47 merit gfs_controld[1774]: send plock error -1
Nov  6 15:27:47 merit gfs_controld[1774]: cpg_mcast_joined error 2 handle 6b8b
456700000000 MSG_PLOCK
Nov  6 15:27:47 merit gfs_controld[1774]: send plock error -1
Nov  6 15:27:47 merit gfs_controld[1774]: cpg connection died
Nov  6 15:27:47 merit gfs_controld[1774]: cluster is down, exiting

The node chosen to be killed in this revolver iteration was winston.
The various logs are attached.

Comment 1 Abhijith Das 2006-11-06 22:15:14 UTC

cman_tool nodes output:
[root@camel ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  58804   2006-11-06 15:28:26  camel
   2   M  58804   2006-11-06 15:28:26  merit
   3   M  58804   2006-11-06 15:28:26  winston
   4   M  58804   2006-11-06 15:28:26  kool
   5   M  58804   2006-11-06 15:28:26  salem

[root@merit ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  58804   2006-11-06 15:28:36  camel
   2   M  58764   2006-11-06 15:17:10  merit
   3   M  58792   2006-11-06 15:27:00  winston
   4   M  58780   2006-11-06 15:18:17  kool
   5   M  58776   2006-11-06 15:18:11  salem

[root@winston ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  58804   2006-11-06 15:29:28  camel
   2   M  58792   2006-11-06 15:27:52  merit
   3   M  58784   2006-11-06 15:27:52  winston
   4   M  58792   2006-11-06 15:27:52  kool
   5   M  58792   2006-11-06 15:27:52  salem

[root@kool ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  58804   2006-11-06 15:30:46  camel
   2   M  58780   2006-11-06 15:20:28  merit
   3   M  58792   2006-11-06 15:29:11  winston
   4   M  58780   2006-11-06 15:20:28  kool
   5   M  58780   2006-11-06 15:20:28  salem

[root@salem ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M  58804   2006-11-06 15:32:27  camel
   2   M  58776   2006-11-06 15:22:03  merit
   3   M  58792   2006-11-06 15:30:51  winston
   4   M  58780   2006-11-06 15:22:08  kool
   5   M  58764   2006-11-06 15:22:03  salem

group_tool -v output:
[root@camel ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none        
[1 2 3 4 5]
dlm              1     clvmd    00010001 none        
[1 2 3 4 5]

[root@merit ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none        
[1 2 3 4 5]
dlm              1     clvmd    00010001 none        
[1 2 3 4 5]
dlm              1     soot     00020002 FAIL_START_WAIT 1 100030003 0
[2 4 5]
dlm              1     ash      00040002 FAIL_START_WAIT 1 100030003 0
[2 4 5]
dlm              1     cancer   00060002 FAIL_START_WAIT 1 100030003 0
[2 4 5]
gfs              2     soot     00010002 FAIL_START_WAIT 1 100030003 0
[2 4 5]
gfs              2     ash      00030002 FAIL_START_WAIT 1 100030003 0
[2 4 5]
gfs              2     cancer   00050002 FAIL_START_WAIT 1 100030003 0
[2 4 5]

[root@winston ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none        
[1 2 3 4 5]
dlm              1     clvmd    00010001 none        
[1 2 3 4 5]
gfs              2     soot     00000000 JOIN_STOP_WAIT 3 300040001 1
[2 3 4 5]

[root@kool ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none        
[1 2 3 4 5]
dlm              1     clvmd    00010001 none        
[1 2 3 4 5]
dlm              1     soot     00020002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
dlm              1     ash      00040002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
dlm              1     cancer   00060002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     soot     00010002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     ash      00030002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     cancer   00050002 FAIL_START_WAIT 1 100030003 1
[2 4 5]

[root@salem ~]# group_tool -v
type             level name     id       state node id local_done
fence            0     default  00010003 none        
[1 2 3 4 5]
dlm              1     clvmd    00010001 none        
[1 2 3 4 5]
dlm              1     soot     00020002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
dlm              1     ash      00040002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
dlm              1     cancer   00060002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     soot     00010002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     ash      00030002 FAIL_START_WAIT 1 100030003 1
[2 4 5]
gfs              2     cancer   00050002 FAIL_START_WAIT 1 100030003 1
[2 4 5]

Comment 2 Abhijith Das 2006-11-06 22:16:46 UTC

cman_tool status output:
[root@camel ~]# cman_tool status
Version: 6.0.1
Config Version: 1
Cluster Name: smoke
Cluster Id: 3471
Cluster Member: Yes
Cluster Generation: 58804
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 11  
Node name: camel
Node ID: 1
Multicast addresses: 239.192.13.156 
Node addresses: 10.15.89.52 

[root@merit ~]# cman_tool status
Version: 6.0.1
Config Version: 1
Cluster Name: smoke
Cluster Id: 3471
Cluster Member: Yes
Cluster Generation: 58804
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 6
Flags: 
Ports Bound: 0 11  
Node name: merit
Node ID: 2
Multicast addresses: 239.192.13.156 
Node addresses: 10.15.89.54 

[root@winston ~]# cman_tool status
Version: 6.0.1
Config Version: 1
Cluster Name: smoke
Cluster Id: 3471
Cluster Member: Yes
Cluster Generation: 58804
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 11  
Node name: winston
Node ID: 3
Multicast addresses: 239.192.13.156 
Node addresses: 10.15.89.53 

[root@kool ~]# cman_tool status
Version: 6.0.1
Config Version: 1
Cluster Name: smoke
Cluster Id: 3471
Cluster Member: Yes
Cluster Generation: 58804
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 11  
Node name: kool
Node ID: 4
Multicast addresses: 239.192.13.156 
Node addresses: 10.15.89.56 

[root@salem ~]# cman_tool status
Version: 6.0.1
Config Version: 1
Cluster Name: smoke
Cluster Id: 3471
Cluster Member: Yes
Cluster Generation: 58804
Membership state: Cluster-Member
Nodes: 5
Expected votes: 5
Total votes: 5
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 11  
Node name: salem
Node ID: 5
Multicast addresses: 239.192.13.156 
Node addresses: 10.15.89.57

Comment 3 Abhijith Das 2006-11-06 22:22:13 UTC

Created attachment 140519 [details]
group_tool dump on all nodes

Comment 4 Abhijith Das 2006-11-06 22:24:04 UTC

Created attachment 140520 [details]
/var/log/messages on the smoke cluster nodes:

Comment 5 Kiersten (Kerri) Anderson 2006-11-06 22:37:18 UTC

Beta2 Blocker proposed.  Problem found when running revolver on the smoke
cluster in the same configuration as the qe release criteria.  Test is being
restarted to collect more information.

Comment 6 RHEL Program Management 2006-11-06 22:46:39 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.

Comment 7 Steven Dake 2006-11-09 02:58:34 UTC

I believe I have identified the cause of the wierd configuration changes
experienced in the qa and engineering labs.

Every time a new processor is added or an existing processor is removed from the
configuration, a new round of consensus gathering (membership determiniation)
occurs.  This is what revolver is exercising.  A join message is sent upon
entering the gather phase of the protocol.  A join timer is started (100 msec
timeout).  If consensus is not reached within 100 msec, the join timeout expires
- a new join message is sent - the join timer is restarted.  This means that the
join message will be sent every 100 msec until consensus is reached.  To bound
the time period under which consensus should be tried to be reached, a consensus
timer is started on entry to the gather phase.  When this consensus timer
expires, any processors with which consensus could not be reached are added to
the failed list and gather is entered again and the process is repeated until a
membership is formed.

In this case, overload of the network can cause the join message to be lost by
some of the processors in the network.  The consensus timeout is configured to
200msec, so the join message is only resent once (for a total of two times) or
when a new join message should be sent because a processor identified a new
potential member. This causes a node to be excluded from the membership.  A
short while later, the nodes multicast a message.  If the message is from a
different ring, the membership protocol is started (gather entered, see above).
 This time around, however, consensus is reached because the messages or retries
get through to the nodes.  This behavior confuses upper level components.

The consensus timeout must be less then the token timeout.  The consensus
timeout/join timeout determines the number of times a join message is resent.

I suggest changing the join timeout to 60msec and the consensus timeout to 4800
msec to offer the greatest possibility of forming a new configuration under
network overload conditions.  These changes are made in the cman parser so cman
must be rebuilt after it is modified.

I have also identified a discrepency with the specification vs the code which
causes mp5 to fail after 30-45 mins.  I don't completely understand the scenario
but I'd defer to the specification and the fact that mp5 now runs properly. 
This patch is a one liner and must have been missed in a previous commit because
it is in one of my work trees under which i ran mp5 for several days without
failure.  This requires a rebuild of the openais package.

Comment 8 Christine Caulfield 2006-11-09 13:11:02 UTC

cman defaults changed:

Checking in ais.c;
/cvs/cluster/cluster/cman/daemon/ais.c,v  <--  ais.c
new revision: 1.45; previous revision: 1.44
done

Comment 10 Steven Dake 2006-11-09 17:16:36 UTC

fixed in openais-0.80.1-15.el5 and some newer version of cman

Note You need to log in before you can comment on or make changes to this bug.