Bug 469899

Summary: reported ccsd updates not effective
Product: Red Hat Enterprise Linux 5 Reporter: Steve Reichard <sreichar>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: cluster-maint, edamato
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-16 14:05:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steve Reichard 2008-11-04 16:59:22 UTC
Description of problem:

Heterogenous cluster of HP DL580 G5 (Intel 16 core, 64 GB and AMD 8 core 72 GB)
running RHEL 5.2 AP and recently updated

This problem may be related to Bugzillas: 469874 and 469888

FYI: rgmanager disabled

The cluster configuration file was update to remove a test service.  ccs_tool update and cman version -r were issued.   The message file logs the update, however shortly after the  node fences.   Upon boot after the fence, a version mismatch was reported.   This resolved on the next boot (after a oom panic)

Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6).

Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6).

Nov  3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6




Version-Release number of selected component (if applicable):
[root@renoir crash]# cat /etc/redhat-release ;  uname -a
Red Hat Enterprise Linux Server release 5.2 (Tikanga)
Linux renoir.lab.bos.redhat.com 2.6.18-92.1.13.el5xen #1 SMP Thu Sep 4 04:07:08
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@renoir crash]#



How reproducible:

While unable to force a reporduction, this was noticed

[root@renoir log]# grep -e conflict -e "Update of cluster.conf complete"  messages
Nov  3 08:01:37 renoir ccsd[7230]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 08:07:52 renoir openais[7259]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 
Nov  3 08:11:42 renoir ccsd[7250]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 08:12:37 renoir ccsd[7250]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 08:23:36 renoir openais[7265]: [CMAN ] Node 2 conflict, remote config version id=2, local=4 
Nov  3 10:02:12 renoir ccsd[9752]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 10:03:16 renoir ccsd[9752]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 10:04:09 renoir ccsd[9752]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 13:01:52 renoir ccsd[9604]: Update of cluster.conf complete (version 4 -> 5). 
Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6). 
Nov  3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 
Nov  4 11:11:36 renoir ccsd[7271]: Update of cluster.conf complete (version 6 -> 7). 
[root@renoir log]# 


[root@monet log]#  grep -e conflict -e "Update of cluster.conf complete"  messages
Nov  3 08:01:37 monet ccsd[18166]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 08:07:52 monet openais[18173]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 
Nov  3 08:23:36 monet openais[18173]: [CMAN ] Node 1 conflict, remote config version id=4, local=2 
Nov  3 10:02:12 monet ccsd[12218]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 10:03:16 monet ccsd[12218]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 10:04:09 monet ccsd[12218]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 13:01:52 monet ccsd[12218]: Update of cluster.conf complete (version 4 -> 5). 
Nov  3 15:10:50 monet ccsd[8302]: Update of cluster.conf complete (version 5 -> 6). 
Nov  3 15:16:06 monet openais[8309]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 
Nov  4 11:11:32 monet ccsd[8302]: Update of cluster.conf complete (version 6 -> 7). 
Nov  4 11:11:50 monet openais[8309]: [CMAN ] Node 1 conflict, remote config version id=6, local=7 
[root@monet log]# 




Expanding --
[root@renoir log]# grep -e conflict -e "Update of cluster.conf complete"  messages\.*
messages.1:Oct 27 09:25:07 renoir ccsd[6171]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 27 09:54:57 renoir ccsd[6171]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 27 09:56:11 renoir ccsd[6171]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 27 11:08:35 renoir openais[7012]: [CMAN ] Node 2 conflict, remote config version id=5, local=4 
messages.1:Oct 27 11:49:49 renoir ccsd[7027]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 27 11:50:08 renoir ccsd[7027]: Update of cluster.conf complete (version 6 -> 7). 
messages.1:Oct 27 14:07:04 renoir ccsd[7395]: Update of cluster.conf complete (version 7 -> 8). 
messages.1:Oct 28 10:42:50 renoir ccsd[7394]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 28 10:48:22 renoir ccsd[7394]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 28 11:36:25 renoir ccsd[20292]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 28 15:06:45 renoir ccsd[7394]: Update of cluster.conf complete (version 4 -> 5). 
messages.1:Oct 28 15:08:25 renoir ccsd[7394]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 28 15:35:05 renoir openais[7398]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 
messages.1:Oct 29 17:30:01 renoir ccsd[10482]: Update of cluster.conf complete (version 1 -> 2). 


[root@monet log]#  grep -e conflict -e "Update of cluster.conf complete"  messages\.*
messages.1:Oct 27 09:25:07 monet ccsd[31211]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 27 09:54:57 monet ccsd[31211]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 27 09:56:11 monet ccsd[31211]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 27 11:08:42 monet openais[8238]: [CMAN ] Node 1 conflict, remote config version id=4, local=5 
messages.1:Oct 27 11:49:53 monet ccsd[8230]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 27 11:50:08 monet ccsd[8230]: Update of cluster.conf complete (version 6 -> 7). 
messages.1:Oct 27 14:07:04 monet ccsd[8591]: Update of cluster.conf complete (version 7 -> 8). 
messages.1:Oct 28 10:42:50 monet ccsd[8560]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 28 10:48:22 monet ccsd[8560]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 28 11:36:25 monet ccsd[23631]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 28 15:06:55 monet ccsd[31396]: Update of cluster.conf complete (version 4 -> 5). 
messages.1:Oct 28 15:08:36 monet ccsd[31396]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 28 15:35:14 monet openais[31427]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 
messages.1:Oct 29 17:30:09 monet ccsd[12661]: Update of cluster.conf complete (version 1 -> 2). 








Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2008-11-06 09:36:37 UTC
Some more coeval syslogs would be helpful, ideally not quite as expurgated as those. Also, it would be helpful if the times were synchronised on the systems.

It's hard to make sense of those logs because so much is missing. It might fill up bugzilla but whole logs from before and after an incident (from all systems) are much more useful than carefully grepped logs from a whole day.

Also if the node was fenced very soon after the ccsd update it might just be that the new cluster.conf was not flushed to disk

Comment 2 Steve Reichard 2008-11-06 14:05:57 UTC
I have been struggling with a situation which is fencing nodes under various unexpected circumstance, so your point about not being flushed would seem to be the issue.  

Just re-read the man page and see that it states that the update is for the working version, not stated as an update to the on-disk file.

Comment 3 Christine Caulfield 2009-02-16 14:05:58 UTC
For tidiness I'll close this. Feel free to reopen it if it happens again and we can get some more information.