Description of problem: the function local_node_name on /resources/utils/member_util.sh does not check if magma_tool failed and can return an empty string. Version-Release number of selected component (if applicable): rgmanager-1.9.87-1.el4 How reproducible: every time magma_tool returns an error. Deterministic. Steps to Reproduce: 1.activate HALVM with proper configuration. 2.cman_tool leave force 3.notice the errors in the log saying wrong configuration for HALVM Actual results: When magma_tool fails (and one could argue that the cluster is not working at all), scripts might misinterpret the value from the function local_node_name. Expected results: local_node_name return 2 if there is a problem when treating the output of magma_tool Additional info: This issue is low priority because it happens most often on a clusternode that is already outside the cluster and/or can not access magma. The problem can occur in many instances, namely when a machine has been disconnected from the cluster. It will certainly be fenced, but the errors below give the wrong impression that the setup of the cluster is wrong: Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> WARNING: An improper setup can cause data corruption! Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> HA LVM: Improper setup detected (sanitized input) Aug 10 09:06:29 node2 kernel: CMAN: sendmsg failed: -22 Aug 10 09:06:49 node2 last message repeated 4 times Aug 10 09:06:49 node2 kernel: CMAN: No functional network interfaces, leaving cluster Aug 10 09:06:49 node2 kernel: CMAN: sendmsg failed: -22 Aug 10 09:06:49 node2 kernel: CMAN: sendmsg failed: -22 Aug 10 09:06:49 node2 kernel: CMAN: we are leaving the cluster. Aug 10 09:06:49 node2 kernel: WARNING: dlm_emergency_shutdown Aug 10 09:06:49 node2 clurgmgrd[7365]: <warning> #67: Shutting down uncleanly Aug 10 09:06:49 node2 clurgmgrd[7365]: <debug> Emergency stop of cluster_cible_BDD Aug 10 09:06:49 node2 ccsd[7262]: Cluster manager shutdown. Attemping to reconnect... Aug 10 09:06:49 node2 kernel: WARNING: dlm_emergency_shutdown finished 1 Aug 10 09:06:49 node2 kernel: SM: 00000003 sm_stop: SG still joined Aug 10 09:06:49 node2 udev[12286]: removing device node '/dev/misc/dlm_Magma' Aug 10 09:06:49 node2 udevd[2353]: udev done! Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> stop: Could not match /dev/VG/LV with a real device Aug 10 09:06:49 node2 clurgmgrd[7365]: <notice> stop on fs "FS" returned 2 (invalid argument(s)) Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> HA LVM: Improper setup detected Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> - @ missing from "volume_list" in lvm.conf Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> WARNING: An improper setup can cause data corruption! Aug 10 09:06:50 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. Aug 10 09:06:50 node2 clurgmgrd: [7365]: <err> Unable to determine cluster node name Aug 10 09:06:50 node2 ccsd[7262]: Cluster is not quorate. Refusing connection. .. Aug 10 09:06:51 node2 qdiskd[7336]: <err> cman_dispatch: Host is down Aug 10 09:06:51 node2 qdiskd[7336]: <err> Halting qdisk operations Impact is therefore low.
Created attachment 357019 [details] initial patch to return 2 when magma_tool fails. Proposing the following patch to fix the problem. Arguably HALVM should also do input sanity checks and reject the output from local_node_name if it is empty. Eduardo.
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=1ad2e0a85482ab86e63613a345f9a18bcdc42cba
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, the function local_node_name in /resources/utils/member_util.sh did not properly check if magma_tool failed and could return an empty string. With this update, this issue is resolved.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0264.html