Bug 516758 - rgmanager: local_node_name does not check if magma_tool failed.
rgmanager: local_node_name does not check if magma_tool failed.
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-11 08:26 EDT by Eduardo Damato
Modified: 2011-02-16 10:07 EST (History)
7 users (show)

See Also:
Fixed In Version: rgmanager-1.9.88-1.el4
Doc Type: Bug Fix
Doc Text:
Previously, the function local_node_name in /resources/utils/member_util.sh did not properly check if magma_tool failed and could return an empty string. With this update, this issue is resolved.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-16 10:07:14 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0264 normal SHIPPED_LIVE Low: rgmanager security and bug fix update 2011-02-16 10:07:04 EST

  None (edit)
Description Eduardo Damato 2009-08-11 08:26:27 EDT
Description of problem:

the function local_node_name on /resources/utils/member_util.sh does not check if magma_tool failed and can return an empty string.

Version-Release number of selected component (if applicable):

rgmanager-1.9.87-1.el4

How reproducible:

every time magma_tool returns an error. Deterministic.

Steps to Reproduce:
1.activate HALVM with proper configuration.
2.cman_tool leave force
3.notice the errors in the log saying wrong configuration for HALVM
  
Actual results:

When magma_tool fails (and one could argue that the cluster is not working at all), scripts might misinterpret the value from the function local_node_name.

Expected results:

local_node_name return 2 if there is a problem when treating the output of magma_tool

Additional info:

This issue is low priority because it happens most often on a clusternode that is already outside the cluster and/or can not access magma.

The problem can occur in many instances, namely when a machine has been disconnected from the cluster. It will certainly be fenced, but the errors below give the wrong impression that the setup of the cluster is wrong:

Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> WARNING: An improper setup can cause data corruption! 
Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> HA LVM:  Improper setup detected 

(sanitized input)


Aug 10 09:06:29 node2 kernel: CMAN: sendmsg failed: -22 
Aug 10 09:06:49 node2 last message repeated 4 times 
Aug 10 09:06:49 node2 kernel: CMAN: No functional network interfaces, leaving cluster 
Aug 10 09:06:49 node2 kernel: CMAN: sendmsg failed: -22 
Aug 10 09:06:49 node2 kernel: CMAN: sendmsg failed: -22 
Aug 10 09:06:49 node2 kernel: CMAN: we are leaving the cluster. 
Aug 10 09:06:49 node2 kernel: WARNING: dlm_emergency_shutdown 
Aug 10 09:06:49 node2 clurgmgrd[7365]: <warning> #67: Shutting down uncleanly 
Aug 10 09:06:49 node2 clurgmgrd[7365]: <debug> Emergency stop of cluster_cible_BDD 
Aug 10 09:06:49 node2 ccsd[7262]: Cluster manager shutdown.  Attemping to reconnect... 
Aug 10 09:06:49 node2 kernel: WARNING: dlm_emergency_shutdown finished 1 
Aug 10 09:06:49 node2 kernel: SM: 00000003 sm_stop: SG still joined 
Aug 10 09:06:49 node2 udev[12286]: removing device node '/dev/misc/dlm_Magma' 
Aug 10 09:06:49 node2 udevd[2353]: udev done! 
Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused 
Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> stop: Could not match /dev/VG/LV with a real device 
Aug 10 09:06:49 node2 clurgmgrd[7365]: <notice> stop on fs "FS" returned 2 (invalid argument(s)) 
Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused 
Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> HA LVM:  Improper setup detected 
Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
Aug 10 09:06:49 node2 ccsd[7262]: Error while processing connect: Connection refused 
Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> - @ missing from "volume_list" in lvm.conf 
Aug 10 09:06:49 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
Aug 10 09:06:49 node2 clurgmgrd: [7365]: <err> WARNING: An improper setup can cause data corruption! 
Aug 10 09:06:50 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
Aug 10 09:06:50 node2 clurgmgrd: [7365]: <err> Unable to determine cluster node name 
Aug 10 09:06:50 node2 ccsd[7262]: Cluster is not quorate.  Refusing connection. 
.. 
Aug 10 09:06:51 node2 qdiskd[7336]: <err> cman_dispatch: Host is down 
Aug 10 09:06:51 node2 qdiskd[7336]: <err> Halting qdisk operations 
 
Impact is therefore low.
Comment 2 Eduardo Damato 2009-08-11 08:31:12 EDT
Created attachment 357019 [details]
initial patch to return 2 when magma_tool fails.


Proposing the following patch to fix the problem. Arguably HALVM should also do input sanity checks and reject the output from local_node_name if it is empty.

Eduardo.
Comment 10 Florian Nadge 2011-01-03 09:06:21 EST
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the function local_node_name in /resources/utils/member_util.sh did not properly check if magma_tool failed and could return an empty string. With this update, this issue is resolved.
Comment 11 errata-xmlrpc 2011-02-16 10:07:14 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0264.html

Note You need to log in before you can comment on or make changes to this bug.