Description of problem: If using qdiskd with a quorum disk rgmanager ist not able to start services. Without starting qdiskd rgmanager works fine. Version-Release number of selected component (if applicable): RHEL 5: cman-2.0.60-1.el5, rgmanager-2.0.23-1 (x86_64) How reproducible: Use a quorum disk Steps to Reproduce: 1. configure quorum disk in cluster.conf 2. start cman 3. start qdiskd 4. start rgmanager Actual results: no services running, clustat hangs when starting, system-config-cluster hangs when starting, /var/log/messages: Mar 30 14:09:27 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for service service:pg-ba-vts1 Mar 30 14:09:43 pg-ba-001 clurgmgrd[20629]: <err> #34: Cannot get status for service service:pg-ba-vts2 Expected results: Running services. Additional info: I attached my cluster.conf. Registration of quorum succeeds in cman.
Created attachment 151269 [details] Cluster Configuration File
Additional info: clurgmgrd appears to be suffering the same fate as ccs_tool in bug #223519, treating the quorum disk as an actual node. When clurgmgrd first starts, it attempts to make contact with the quorum disk "node" to determine the status of the services its running. This times out, causing an "abort": [12453] info: State change: Local UP [12453] info: State change: sys-b UP [12453] info: State change: /dev/dm-3 UP #Note: Quorum Disk ... aight, need responses from 3 guys VF: Push 2.12453 #1 (X#00020001) VF: Checking for consensus... ... VF: YES VF: YES VF: Timed out waiting for 1 responses VF: Broadcasting ABORT (X#00020002) VF: Aborted! I was able to construct a proof of concept by adding code to rgmanager/src/daemons/main.c:membership_update() that sets cn_member to 0 for the cml_members element which has a cn_nodeid of 0. Afterwords, the resource manager appears to function as expected. Additionally, clustat no longer hangs with a “Timed out waiting for a response from Resource Group Manager” message. I hope that this information assists in leading to a proper patch, as mine was a rather brute force solution.
Created attachment 152699 [details] Fix fix Hi, this should fix it.
Actually, it sounds like exactly what you did, but in a different location. ;)
Thanks for that! Will there be an official errata for this problem?
I can't confirm one way or the other at this point, but it looks like it will be in update 1 for certain.
Fixing Product Name. Cluster Suite was integrated into the Enterprise Linux for version 5.0.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Hi! Do you have any news for me if this fix will be in an upcoming errata or in the next Update for RHEL5? Regards, Robert
Update 1 for RHEL5 :)
lpleiman
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0580.html