1658557 – After node eviction, during stopping of rgmanager, the error "unable to determine cluster node name,HA LVM: Improper setup detected' error" was logged.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1658557 - After node eviction, during stopping of rgmanager, the error "unable to determine cluster node name,HA LVM: Improper setup detected' error" was logged.

Summary: After node eviction, during stopping of rgmanager, the error "unable to deter...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	cluster
Sub Component:
Version:	6.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Christine Caulfield
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-12 12:13 UTC by SUNGTM
Modified:	2019-06-18 19:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-18 19:07:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description SUNGTM 2018-12-12 12:13:48 UTC

Description of problem:

When a Cluster node A detects that the other node B missed qdisk updates, it sends eviction notice to B. In the node B, cman killed as per logs. When rgmanager stopped, "unable to find cluster node name,HA LVM: Improper setup detected' error logged.


Please, confirm that when cman components were killed on the affected node, all corosync/cman commands won't work. Is it correct?


We checked the RHEL HA resource agent Shell Script source codes for the verifying the following error statements found at the log:

rgmanager[92694]: [lvm] HA LVM:  Improper setup detected
rgmanager[92704]: [lvm] * @ missing from "volume_list" in lvm.conf
rgmanager[92719]: [lvm] Owner of VG_DB/lv_db is not in the cluster


As per Source code listing /usr/share/cluster/lvm.sh, if 'volume_list' won't match with Cluster member name, it throws the above-mentioned errors:

    if ! lvm dumpconfig activation/volume_list | grep $(local_node_name); then
                    ocf_log err "HA LVM:  Improper setup detected"
                    ocf_log err "* @$(local_node_name) missing from \"volume_list\" in lvm.conf"
                    return $OCF_ERR_GENERIC
     fi


Cluster member name is obtained from a function local_node_name through Source code listing /usr/share/cluster/utils/member_util.sh:

    local_node_name()
    {
           ...
           ...

            if which cman_tool &> /dev/null; then
                    # Use cman_tool

                    line=$(cman_tool status | grep -i "Node name: $1")
                    [ -n "$line" ] || return 1
                    echo ${line/*name: /}
                    return 0
            fi
           ...
           ...

            return 1
    }


As already A detected the missing of qdisk updates from B, A evicted the other node that caused cman components were killed.

A:
qdiskd[5904]: Writing eviction notice for node 2
qdiskd[5904]: Node 2 evicted

B:

corosync[5860]: cman killed by node 1 because we were killed by cman_tool or other application
rgmanager[8032]: #67: Shutting down uncleanly
fenced[6619]: cluster is down, exiting


As cman components were killed on DCN-02, all cman related commands won't work (cman_tool, ccs etc.,) [PLEASE, CONFIRM THIS!!!!!] and other processes were also exiting. 
Before rgmanager process exited (uncleanly), all Cluster resources were tried to stop.
Firstly, stopping of FS resource succeeded. During LVM stopping, Cluster member was checked through cman_tool command which didn't produce any output. This caused that the above-mentioned statements (HA-LVM) were logged on DCN-02 node.


Even though, getting error 'HA-LVM improper setting detected' may seem an expected behaviour with respect to existing resource agent script (lvm.sh), it may be a bug in the context of LVM resource deactivation as already cman/corosync components were killed and trying to execute those component-related commands will always be null. 



Version-Release number of selected component (if applicable):

resource-agents-3.9.2-40.el6.x86_64

How reproducible:


Steps to Reproduce:
1. Node A evicts Node B when B misses qdisk updates
2. Cman components were killed on Node B
3. rgmanager shutting down started. File system resource was successfully stopped. But, stopping of HA-LVM resource (i.e, deactivation) failed due to 'cman' processes were killed before stopping of rgmanager during node evictions.



Actual results:

rgmanager[92694]: [lvm] HA LVM:  Improper setup detected
rgmanager[92704]: [lvm] * @ missing from "volume_list" in lvm.conf

Expected results:


HA-LVM resource should be successfully stopped.

Additional info:

Comment 2 Chris Williams 2019-06-18 19:07:24 UTC

When Red Hat shipped 6.8 on May 10, 2016 Red Hat Enterprise Linux 6 entered Maintenance Support 1 Phase.

https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_1_Phase

That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released". RHEL 6 is now in Maintenance Phase 2 and this BZ does not appear to meet Maintenance Support 2 Phase criteria so is being closed WONTFIX. If this is critical for your environment please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration in the next minor release.

Note You need to log in before you can comment on or make changes to this bug.