Bug 469899 - reported ccsd updates not effective
Summary: reported ccsd updates not effective
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-04 16:59 UTC by Steve Reichard
Modified: 2009-04-16 22:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-16 14:05:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Steve Reichard 2008-11-04 16:59:22 UTC
Description of problem:

Heterogenous cluster of HP DL580 G5 (Intel 16 core, 64 GB and AMD 8 core 72 GB)
running RHEL 5.2 AP and recently updated

This problem may be related to Bugzillas: 469874 and 469888

FYI: rgmanager disabled

The cluster configuration file was update to remove a test service.  ccs_tool update and cman version -r were issued.   The message file logs the update, however shortly after the  node fences.   Upon boot after the fence, a version mismatch was reported.   This resolved on the next boot (after a oom panic)

Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6).

Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6).

Nov  3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6




Version-Release number of selected component (if applicable):
[root@renoir crash]# cat /etc/redhat-release ;  uname -a
Red Hat Enterprise Linux Server release 5.2 (Tikanga)
Linux renoir.lab.bos.redhat.com 2.6.18-92.1.13.el5xen #1 SMP Thu Sep 4 04:07:08
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@renoir crash]#



How reproducible:

While unable to force a reporduction, this was noticed

[root@renoir log]# grep -e conflict -e "Update of cluster.conf complete"  messages
Nov  3 08:01:37 renoir ccsd[7230]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 08:07:52 renoir openais[7259]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 
Nov  3 08:11:42 renoir ccsd[7250]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 08:12:37 renoir ccsd[7250]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 08:23:36 renoir openais[7265]: [CMAN ] Node 2 conflict, remote config version id=2, local=4 
Nov  3 10:02:12 renoir ccsd[9752]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 10:03:16 renoir ccsd[9752]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 10:04:09 renoir ccsd[9752]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 13:01:52 renoir ccsd[9604]: Update of cluster.conf complete (version 4 -> 5). 
Nov  3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6). 
Nov  3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 
Nov  4 11:11:36 renoir ccsd[7271]: Update of cluster.conf complete (version 6 -> 7). 
[root@renoir log]# 


[root@monet log]#  grep -e conflict -e "Update of cluster.conf complete"  messages
Nov  3 08:01:37 monet ccsd[18166]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 08:07:52 monet openais[18173]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 
Nov  3 08:23:36 monet openais[18173]: [CMAN ] Node 1 conflict, remote config version id=4, local=2 
Nov  3 10:02:12 monet ccsd[12218]: Update of cluster.conf complete (version 1 -> 2). 
Nov  3 10:03:16 monet ccsd[12218]: Update of cluster.conf complete (version 2 -> 3). 
Nov  3 10:04:09 monet ccsd[12218]: Update of cluster.conf complete (version 3 -> 4). 
Nov  3 13:01:52 monet ccsd[12218]: Update of cluster.conf complete (version 4 -> 5). 
Nov  3 15:10:50 monet ccsd[8302]: Update of cluster.conf complete (version 5 -> 6). 
Nov  3 15:16:06 monet openais[8309]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 
Nov  4 11:11:32 monet ccsd[8302]: Update of cluster.conf complete (version 6 -> 7). 
Nov  4 11:11:50 monet openais[8309]: [CMAN ] Node 1 conflict, remote config version id=6, local=7 
[root@monet log]# 




Expanding --
[root@renoir log]# grep -e conflict -e "Update of cluster.conf complete"  messages\.*
messages.1:Oct 27 09:25:07 renoir ccsd[6171]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 27 09:54:57 renoir ccsd[6171]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 27 09:56:11 renoir ccsd[6171]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 27 11:08:35 renoir openais[7012]: [CMAN ] Node 2 conflict, remote config version id=5, local=4 
messages.1:Oct 27 11:49:49 renoir ccsd[7027]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 27 11:50:08 renoir ccsd[7027]: Update of cluster.conf complete (version 6 -> 7). 
messages.1:Oct 27 14:07:04 renoir ccsd[7395]: Update of cluster.conf complete (version 7 -> 8). 
messages.1:Oct 28 10:42:50 renoir ccsd[7394]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 28 10:48:22 renoir ccsd[7394]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 28 11:36:25 renoir ccsd[20292]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 28 15:06:45 renoir ccsd[7394]: Update of cluster.conf complete (version 4 -> 5). 
messages.1:Oct 28 15:08:25 renoir ccsd[7394]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 28 15:35:05 renoir openais[7398]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 
messages.1:Oct 29 17:30:01 renoir ccsd[10482]: Update of cluster.conf complete (version 1 -> 2). 


[root@monet log]#  grep -e conflict -e "Update of cluster.conf complete"  messages\.*
messages.1:Oct 27 09:25:07 monet ccsd[31211]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 27 09:54:57 monet ccsd[31211]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 27 09:56:11 monet ccsd[31211]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 27 11:08:42 monet openais[8238]: [CMAN ] Node 1 conflict, remote config version id=4, local=5 
messages.1:Oct 27 11:49:53 monet ccsd[8230]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 27 11:50:08 monet ccsd[8230]: Update of cluster.conf complete (version 6 -> 7). 
messages.1:Oct 27 14:07:04 monet ccsd[8591]: Update of cluster.conf complete (version 7 -> 8). 
messages.1:Oct 28 10:42:50 monet ccsd[8560]: Update of cluster.conf complete (version 1 -> 2). 
messages.1:Oct 28 10:48:22 monet ccsd[8560]: Update of cluster.conf complete (version 2 -> 3). 
messages.1:Oct 28 11:36:25 monet ccsd[23631]: Update of cluster.conf complete (version 3 -> 4). 
messages.1:Oct 28 15:06:55 monet ccsd[31396]: Update of cluster.conf complete (version 4 -> 5). 
messages.1:Oct 28 15:08:36 monet ccsd[31396]: Update of cluster.conf complete (version 5 -> 6). 
messages.1:Oct 28 15:35:14 monet openais[31427]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 
messages.1:Oct 29 17:30:09 monet ccsd[12661]: Update of cluster.conf complete (version 1 -> 2). 








Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Christine Caulfield 2008-11-06 09:36:37 UTC
Some more coeval syslogs would be helpful, ideally not quite as expurgated as those. Also, it would be helpful if the times were synchronised on the systems.

It's hard to make sense of those logs because so much is missing. It might fill up bugzilla but whole logs from before and after an incident (from all systems) are much more useful than carefully grepped logs from a whole day.

Also if the node was fenced very soon after the ccsd update it might just be that the new cluster.conf was not flushed to disk

Comment 2 Steve Reichard 2008-11-06 14:05:57 UTC
I have been struggling with a situation which is fencing nodes under various unexpected circumstance, so your point about not being flushed would seem to be the issue.  

Just re-read the man page and see that it states that the update is for the working version, not stated as an update to the on-disk file.

Comment 3 Christine Caulfield 2009-02-16 14:05:58 UTC
For tidiness I'll close this. Feel free to reopen it if it happens again and we can get some more information.


Note You need to log in before you can comment on or make changes to this bug.