Hide Forgot
Description of problem: When a Cluster node A detects that the other node B missed qdisk updates, it sends eviction notice to B. In the node B, cman killed as per logs. When rgmanager stopped, "unable to find cluster node name,HA LVM: Improper setup detected' error logged. Please, confirm that when cman components were killed on the affected node, all corosync/cman commands won't work. Is it correct? We checked the RHEL HA resource agent Shell Script source codes for the verifying the following error statements found at the log: rgmanager[92694]: [lvm] HA LVM: Improper setup detected rgmanager[92704]: [lvm] * @ missing from "volume_list" in lvm.conf rgmanager[92719]: [lvm] Owner of VG_DB/lv_db is not in the cluster As per Source code listing /usr/share/cluster/lvm.sh, if 'volume_list' won't match with Cluster member name, it throws the above-mentioned errors: if ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then ocf_log err "HA LVM: Improper setup detected" ocf_log err "* @$(local_node_name) missing from \"volume_list\" in lvm.conf" return $OCF_ERR_GENERIC fi Cluster member name is obtained from a function local_node_name through Source code listing /usr/share/cluster/utils/member_util.sh: local_node_name() { ... ... if which cman_tool &> /dev/null; then # Use cman_tool line=$(cman_tool status | grep -i "Node name: $1") [ -n "$line" ] || return 1 echo ${line/*name: /} return 0 fi ... ... return 1 } As already A detected the missing of qdisk updates from B, A evicted the other node that caused cman components were killed. A: qdiskd[5904]: Writing eviction notice for node 2 qdiskd[5904]: Node 2 evicted B: corosync[5860]: cman killed by node 1 because we were killed by cman_tool or other application rgmanager[8032]: #67: Shutting down uncleanly fenced[6619]: cluster is down, exiting As cman components were killed on DCN-02, all cman related commands won't work (cman_tool, ccs etc.,) [PLEASE, CONFIRM THIS!!!!!] and other processes were also exiting. Before rgmanager process exited (uncleanly), all Cluster resources were tried to stop. Firstly, stopping of FS resource succeeded. During LVM stopping, Cluster member was checked through cman_tool command which didn't produce any output. This caused that the above-mentioned statements (HA-LVM) were logged on DCN-02 node. Even though, getting error 'HA-LVM improper setting detected' may seem an expected behaviour with respect to existing resource agent script (lvm.sh), it may be a bug in the context of LVM resource deactivation as already cman/corosync components were killed and trying to execute those component-related commands will always be null. Version-Release number of selected component (if applicable): resource-agents-3.9.2-40.el6.x86_64 How reproducible: Steps to Reproduce: 1. Node A evicts Node B when B misses qdisk updates 2. Cman components were killed on Node B 3. rgmanager shutting down started. File system resource was successfully stopped. But, stopping of HA-LVM resource (i.e, deactivation) failed due to 'cman' processes were killed before stopping of rgmanager during node evictions. Actual results: rgmanager[92694]: [lvm] HA LVM: Improper setup detected rgmanager[92704]: [lvm] * @ missing from "volume_list" in lvm.conf Expected results: HA-LVM resource should be successfully stopped. Additional info:
When Red Hat shipped 6.8 on May 10, 2016 Red Hat Enterprise Linux 6 entered Maintenance Support 1 Phase. https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_1_Phase That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released". RHEL 6 is now in Maintenance Phase 2 and this BZ does not appear to meet Maintenance Support 2 Phase criteria so is being closed WONTFIX. If this is critical for your environment please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration in the next minor release.