Description of problem: After doing a modification in cluster.conf and distribute to the other nodes with ccs_tool update, the cluster.conf change in all the nodes but don't applies in the cluster. Version-Release number of selected component (if applicable): cman-2.0.115-96.el5_8.3.x86_64 How reproducible: Always Steps to Reproduce: 1. modify cluster.conf (add new service) 2. propagate the changes with ccs_tool update 3. check cluster.conf on all nodes and if the change took effect (new service present in clustat output) Actual results: Service is not added Expected results: New service is visible Additional info: From the dupms of rgmanager (attached in the case 00724487) it does show something interesting: $ grep "SAP-BOP" sapproclt0* sapproclt01_rgmanager-dump.mN27ID: rg="service:SAP-BOP", View: 8, Size: 96, Address: 0x2aaab05c3060 sapproclt02_rgmanager-dump.HoDb9M: rg="service:SAP-BOP", View: 8, Size: 96, Address: 0x2aaab0005740 sapproclt03_rgmanager-dump.OpsGLz: rg="service:SAP-BOP", View: 8, Size: 96, Address: 0x4a2eb60 sapproclt04_rgmanager-dump.1P93BU: rg="service:SAP-BOP", View: 8, Size: 96, Address: 0x2aaaac000bb0 sapproclt04_rgmanager-dump.1P93BU: name = SAP-BOP [ primary unique required ] sapproclt04_rgmanager-dump.1P93BU: name = "SAP-BOP"; sapproclt04 does see the resource and the other ones do not. The reason is that they are on a different configuraton version: $ head -n 1 sapproclt0* ==> sapproclt01_rgmanager-dump.mN27ID <== Cluster configuration version 184 ==> sapproclt02_rgmanager-dump.HoDb9M <== Cluster configuration version 184 ==> sapproclt03_rgmanager-dump.OpsGLz <== Cluster configuration version 184 ==> sapproclt04_rgmanager-dump.1P93BU <== Cluster configuration version 187 All the cluster nodes appeared to have gotten the updated configuration file: $ grep -h "Update of cluster.conf complete" */var/log/messages | sort Oct 22 22:00:06 sapproclt01 ccsd[15551]: Update of cluster.conf complete (version 185 -> 186). Oct 22 22:00:06 sapproclt02 ccsd[17157]: Update of cluster.conf complete (version 185 -> 186). Oct 22 22:00:06 sapproclt03 ccsd[15529]: Update of cluster.conf complete (version 185 -> 186). Oct 22 22:00:06 sapproclt04 ccsd[15664]: Update of cluster.conf complete (version 185 -> 186). Oct 22 22:31:06 sapproclt01 ccsd[15551]: Update of cluster.conf complete (version 186 -> 187). Oct 22 22:31:06 sapproclt02 ccsd[17157]: Update of cluster.conf complete (version 186 -> 187). Oct 22 22:31:06 sapproclt03 ccsd[15529]: Update of cluster.conf complete (version 186 -> 187). Oct 22 22:31:06 sapproclt04 ccsd[15664]: Update of cluster.conf complete (version 186 -> 187). ----- There is only 2 ways this occurs: 1) The file was not propagated to all the nodes in the cluster and this does not appear to be the case. 2) There is a bug that prevented it from propagating.
can you please check that the ondisk version of cluster.conf on all nodes is at 187 and please collect all of /var/log/messages from all the nodes. The ccsd update appears to have succeeded, but cman is not using the configuration. We will need the full logs to try to understand why. Also a copy of cluster.conf at 186 and 187 might be useful.
Also it's worth checking that cman has seen the update, it might just be rgmanager that is using the older version. Comparing the output of "cman_tool status" with the rgmanager dump will clear this up.
I don´t have access to the ticket, can you please give me the information I asked for in comment #1 and also for Chrissie in comment #2 ?
This looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=822104 but fencing is not in progress. rgmanager is simply stuck at config version 184 vs on disk (and cman/ccsd) 187. Interesting enough, rgmanager daemon did not produce one single line of log (despite configured to log_level="7") since: Aug 23 22:26:45 sapproclt01 clurgmgrd: [18949]: <notice> Getting status Aug 19 03:56:14 sapproclt02 clurgmgrd: [20392]: <notice> Getting status Aug 23 22:21:57 sapproclt03 clurgmgrd: [18512]: <notice> Getting status Aug 23 23:01:19 sapproclt04 clurgmgrd[18930]: <info> Starting changed resources. [huge no log gap] Oct 22 22:00:16 sapproclt04 clurgmgrd[18930]: <notice> Reconfiguring ... Oct 22 22:31:21 sapproclt04 clurgmgrd: [18930]: <notice> Getting status [no more logs] It appears that log has stopped working at the same time of: sapproclt01 messages.9:Aug 23 23:01:03 sapproclt01 ccsd[15551]: Update of cluster.conf complete (version 184 -> 185). sapproclt02 ccsd has not logged 184 -> 185 update but based on rgmanager dump the config is live. sapproclt03 messages.9:Aug 23 23:01:03 sapproclt03 ccsd[15529]: Update of cluster.conf complete (version 184 -> 185). sapproclt04 messages.9:Aug 23 23:01:03 sapproclt04 ccsd[15664]: Update of cluster.conf complete (version 184 -> 185). Assuming that the cluster.conf stored in the sosreports are the same that have been pushed to production, the differences between 184 and 185 are only confined to few <fs/> services. diff -u sapproclt03-92388/etc/cluster/cluster.conf.230812 sapproclt04-380775/etc/cluster/cluster.conf.221012
(In reply to comment #4) > I don´t have access to the ticket, can you please give me the information I > asked for in comment #1 and also for Chrissie in comment #2 ? (In reply to comment #4) > I don´t have access to the ticket, can you please give me the information I > asked for in comment #1 and also for Chrissie in comment #2 ? On disk version of cluster.conf is updated in all nodes: $ cat sapproclt0*-*/etc/cluster/cluster.conf|grep config_version <cluster alias="cl_PepeJeans" config_version="187" name="cl_PepeJeans"> <cluster alias="cl_PepeJeans" config_version="187" name="cl_PepeJeans"> <cluster alias="cl_PepeJeans" config_version="187" name="cl_PepeJeans"> <cluster alias="cl_PepeJeans" config_version="187" name="cl_PepeJeans"> cman has latest version in all nodes: $ cat sapproclt0*-*/sos_commands/cluster/cman_tool_status | egrep "Node ID|Config Version" Config Version: 187 Node ID: 1 Config Version: 187 Node ID: 2 Config Version: 187 Node ID: 3 Config Version: 187 Node ID: 4 rgmanager doesn't: $ cat sapproclt0*_rgmanager-dump*|grep version Cluster configuration version 184 Cluster configuration version 184 Cluster configuration version 184 Cluster configuration version 187
*** Bug 822104 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1316.html