Description of problem: Heterogenous cluster of HP DL580 G5 (Intel 16 core, 64 GB and AMD 8 core 72 GB) running RHEL 5.2 AP and recently updated This problem may be related to Bugzillas: 469874 and 469888 FYI: rgmanager disabled The cluster configuration file was update to remove a test service. ccs_tool update and cman version -r were issued. The message file logs the update, however shortly after the node fences. Upon boot after the fence, a version mismatch was reported. This resolved on the next boot (after a oom panic) Nov 3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6). Nov 3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6). Nov 3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 Version-Release number of selected component (if applicable): [root@renoir crash]# cat /etc/redhat-release ; uname -a Red Hat Enterprise Linux Server release 5.2 (Tikanga) Linux renoir.lab.bos.redhat.com 2.6.18-92.1.13.el5xen #1 SMP Thu Sep 4 04:07:08 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [root@renoir crash]# How reproducible: While unable to force a reporduction, this was noticed [root@renoir log]# grep -e conflict -e "Update of cluster.conf complete" messages Nov 3 08:01:37 renoir ccsd[7230]: Update of cluster.conf complete (version 1 -> 2). Nov 3 08:07:52 renoir openais[7259]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 Nov 3 08:11:42 renoir ccsd[7250]: Update of cluster.conf complete (version 2 -> 3). Nov 3 08:12:37 renoir ccsd[7250]: Update of cluster.conf complete (version 3 -> 4). Nov 3 08:23:36 renoir openais[7265]: [CMAN ] Node 2 conflict, remote config version id=2, local=4 Nov 3 10:02:12 renoir ccsd[9752]: Update of cluster.conf complete (version 1 -> 2). Nov 3 10:03:16 renoir ccsd[9752]: Update of cluster.conf complete (version 2 -> 3). Nov 3 10:04:09 renoir ccsd[9752]: Update of cluster.conf complete (version 3 -> 4). Nov 3 13:01:52 renoir ccsd[9604]: Update of cluster.conf complete (version 4 -> 5). Nov 3 15:10:50 renoir ccsd[7221]: Update of cluster.conf complete (version 5 -> 6). Nov 3 15:16:08 renoir openais[7263]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 Nov 4 11:11:36 renoir ccsd[7271]: Update of cluster.conf complete (version 6 -> 7). [root@renoir log]# [root@monet log]# grep -e conflict -e "Update of cluster.conf complete" messages Nov 3 08:01:37 monet ccsd[18166]: Update of cluster.conf complete (version 1 -> 2). Nov 3 08:07:52 monet openais[18173]: [CMAN ] Node 2 conflict, remote config version id=1, local=2 Nov 3 08:23:36 monet openais[18173]: [CMAN ] Node 1 conflict, remote config version id=4, local=2 Nov 3 10:02:12 monet ccsd[12218]: Update of cluster.conf complete (version 1 -> 2). Nov 3 10:03:16 monet ccsd[12218]: Update of cluster.conf complete (version 2 -> 3). Nov 3 10:04:09 monet ccsd[12218]: Update of cluster.conf complete (version 3 -> 4). Nov 3 13:01:52 monet ccsd[12218]: Update of cluster.conf complete (version 4 -> 5). Nov 3 15:10:50 monet ccsd[8302]: Update of cluster.conf complete (version 5 -> 6). Nov 3 15:16:06 monet openais[8309]: [CMAN ] Node 2 conflict, remote config version id=5, local=6 Nov 4 11:11:32 monet ccsd[8302]: Update of cluster.conf complete (version 6 -> 7). Nov 4 11:11:50 monet openais[8309]: [CMAN ] Node 1 conflict, remote config version id=6, local=7 [root@monet log]# Expanding -- [root@renoir log]# grep -e conflict -e "Update of cluster.conf complete" messages\.* messages.1:Oct 27 09:25:07 renoir ccsd[6171]: Update of cluster.conf complete (version 1 -> 2). messages.1:Oct 27 09:54:57 renoir ccsd[6171]: Update of cluster.conf complete (version 2 -> 3). messages.1:Oct 27 09:56:11 renoir ccsd[6171]: Update of cluster.conf complete (version 3 -> 4). messages.1:Oct 27 11:08:35 renoir openais[7012]: [CMAN ] Node 2 conflict, remote config version id=5, local=4 messages.1:Oct 27 11:49:49 renoir ccsd[7027]: Update of cluster.conf complete (version 5 -> 6). messages.1:Oct 27 11:50:08 renoir ccsd[7027]: Update of cluster.conf complete (version 6 -> 7). messages.1:Oct 27 14:07:04 renoir ccsd[7395]: Update of cluster.conf complete (version 7 -> 8). messages.1:Oct 28 10:42:50 renoir ccsd[7394]: Update of cluster.conf complete (version 1 -> 2). messages.1:Oct 28 10:48:22 renoir ccsd[7394]: Update of cluster.conf complete (version 2 -> 3). messages.1:Oct 28 11:36:25 renoir ccsd[20292]: Update of cluster.conf complete (version 3 -> 4). messages.1:Oct 28 15:06:45 renoir ccsd[7394]: Update of cluster.conf complete (version 4 -> 5). messages.1:Oct 28 15:08:25 renoir ccsd[7394]: Update of cluster.conf complete (version 5 -> 6). messages.1:Oct 28 15:35:05 renoir openais[7398]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 messages.1:Oct 29 17:30:01 renoir ccsd[10482]: Update of cluster.conf complete (version 1 -> 2). [root@monet log]# grep -e conflict -e "Update of cluster.conf complete" messages\.* messages.1:Oct 27 09:25:07 monet ccsd[31211]: Update of cluster.conf complete (version 1 -> 2). messages.1:Oct 27 09:54:57 monet ccsd[31211]: Update of cluster.conf complete (version 2 -> 3). messages.1:Oct 27 09:56:11 monet ccsd[31211]: Update of cluster.conf complete (version 3 -> 4). messages.1:Oct 27 11:08:42 monet openais[8238]: [CMAN ] Node 1 conflict, remote config version id=4, local=5 messages.1:Oct 27 11:49:53 monet ccsd[8230]: Update of cluster.conf complete (version 5 -> 6). messages.1:Oct 27 11:50:08 monet ccsd[8230]: Update of cluster.conf complete (version 6 -> 7). messages.1:Oct 27 14:07:04 monet ccsd[8591]: Update of cluster.conf complete (version 7 -> 8). messages.1:Oct 28 10:42:50 monet ccsd[8560]: Update of cluster.conf complete (version 1 -> 2). messages.1:Oct 28 10:48:22 monet ccsd[8560]: Update of cluster.conf complete (version 2 -> 3). messages.1:Oct 28 11:36:25 monet ccsd[23631]: Update of cluster.conf complete (version 3 -> 4). messages.1:Oct 28 15:06:55 monet ccsd[31396]: Update of cluster.conf complete (version 4 -> 5). messages.1:Oct 28 15:08:36 monet ccsd[31396]: Update of cluster.conf complete (version 5 -> 6). messages.1:Oct 28 15:35:14 monet openais[31427]: [CMAN ] Node 2 conflict, remote config version id=4, local=6 messages.1:Oct 29 17:30:09 monet ccsd[12661]: Update of cluster.conf complete (version 1 -> 2). Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Some more coeval syslogs would be helpful, ideally not quite as expurgated as those. Also, it would be helpful if the times were synchronised on the systems. It's hard to make sense of those logs because so much is missing. It might fill up bugzilla but whole logs from before and after an incident (from all systems) are much more useful than carefully grepped logs from a whole day. Also if the node was fenced very soon after the ccsd update it might just be that the new cluster.conf was not flushed to disk
I have been struggling with a situation which is fencing nodes under various unexpected circumstance, so your point about not being flushed would seem to be the issue. Just re-read the man page and see that it states that the update is for the working version, not stated as an update to the on-disk file.
For tidiness I'll close this. Feel free to reopen it if it happens again and we can get some more information.