1367813 – Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1367813 - Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum

Summary: Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	7.3
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jan Friesse
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	614122
TreeView+	depends on / blocked

Reported:	2016-08-17 14:28 UTC by Roman Bednář
Modified:	2016-11-04 06:50 UTC (History)
CC List:	6 users (show)
Fixed In Version:	corosync-2.4.0-4.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-04 06:50:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Proposed patch (2.51 KB, patch) 2016-08-25 14:58 UTC, Jan Friesse	no flags	Details \| Diff
Patch with slightly better English comments (1.75 KB, patch) 2016-08-30 14:26 UTC, Christine Caulfield	no flags	Details \| Diff
Man: Fix corosync-qdevice-net-certutil link (839 bytes, patch) 2016-08-31 09:21 UTC, Jan Friesse	no flags	Details \| Diff
man: mention qdevice incompatibilites in votequorum.5 (1.60 KB, patch) 2016-08-31 09:22 UTC, Jan Friesse	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:2463	0	normal	SHIPPED_LIVE	corosync bug fix and enhancement update	2016-11-03 14:06:04 UTC

Description Roman Bednář 2016-08-17 14:28:46 UTC

Description of problem:

See subject.
Also when the nodes are shut down one by one with a delay, this issue does not occur and cluster retains quorum as expected, even with only one node online.

Version-Release number of selected component (if applicable):
corosync-qnetd-2.4.0-3.el7.x86_64
corosynclib-2.4.0-3.el7.x86_64
corosync-qdevice-2.4.0-3.el7.x86_64
corosync-2.4.0-3.el7.x86_64
pcs-0.9.152-6.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1) have a 3 node cluster with qdevice setup on separate node, set to lms algorithm

2) kill 2 nodes at once, quorum is 3 at this point and it seem that we loose a vote from qdevice causing cluster to lose quorum

3) cluster and qdevice status:

# pcs qdevice status net
QNetd address:			*:5403
TLS:				Supported (client certificate required)
Connected clients:		1
Connected clusters:		1
Cluster "STSRHTS10485":
    Algorithm:		LMS
    Tie-breaker:	Node with lowest node ID
    Node ID 1:
        Client address:		::ffff:192.168.0.137:47712
        Configured node list:	1, 2, 3
        Membership node list:	1
        Vote:			NACK (NACK

# pcs quorum status
Quorum information
------------------
Date:             Wed Aug 17 16:09:55 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          1/192
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      1
Quorum:           3 Activity blocked
Flags:            Qdevice 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1   A,NV,NMW virt-136 (local)
         0          0            Qdevice (votes 2)


# pcs status
Cluster name: STSRHTS10485
Stack: corosync
Current DC: virt-136 (version 1.1.15-9.el7-e174ec8) - partition WITHOUT quorum
Last updated: Wed Aug 17 15:49:02 2016		Last change: Tue Aug 16 17:41:30 2016 by root via crm_node on virt-136

3 nodes and 12 resources configured

Node virt-139: UNCLEAN (offline)
Node virt-140: UNCLEAN (offline)
Online: [ virt-136 ]

Full list of resources:

 fence-virt-136	(stonith:fence_xvm):	Started virt-136
 fence-virt-139	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 fence-virt-140	(stonith:fence_xvm):	Started virt-140 (UNCLEAN)
 fence-virt-141	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 Clone Set: dlm-clone [dlm]
     dlm	(ocf::pacemaker:controld):	Started virt-139 (UNCLEAN)
     dlm	(ocf::pacemaker:controld):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 Clone Set: clvmd-clone [clvmd]
     clvmd	(ocf::heartbeat:clvm):	Started virt-139 (UNCLEAN)
     clvmd	(ocf::heartbeat:clvm):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 IP	(ocf::heartbeat:IPaddr2):	Started virt-136
 Webserver	(ocf::heartbeat:apache):	Started virt-136

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


# pcs quorum device status
Qdevice information
-------------------
Model:			Net
Node ID:		1
Configured node list:
    0	Node ID = 1
    1	Node ID = 2
    2	Node ID = 3
Membership node list:	1

Qdevice-net information
----------------------
Cluster name:		STSRHTS10485
QNetd host:		192.168.0.136:5403
Algorithm:		LMS
Tie-breaker:		Node with lowest node ID
State:			Connected
======================================================

Actual results:
quorum lost

Expected results:
quorum should be retained since we have qdevice connection from/to remaining node. This is basically the purpose of qdevice with such setup and it's an advantage from 'standard' LMS cluster.

Comment 2 Jan Friesse 2016-08-17 15:38:36 UTC

Reassigning to Chrissie because LMS is her field.

Comment 4 Jan Friesse 2016-08-22 08:32:21 UTC

@Martin: Is the same problem happening also with ffsplit?

Comment 5 Jan Friesse 2016-08-22 08:40:08 UTC

QE guys, debug logs would be helpful. Qnetd logs to syslog but it has to be configured so open /etc/sysconfig/corosync-qnetd and change line:

COROSYNC_QNETD_OPTIONS=""

to

COROSYNC_QNETD_OPTIONS="-dd"

Qdevice logs are depending on corosync.conf configuration but generally, syslog is enabled so please use:

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

configuration in /etc/corosync/corosync.conf before reporting problems.

Comment 6 Jan Friesse 2016-08-22 08:54:19 UTC

Also comment to both "reports". I'm unable to reproduce any one of them (trying killall -9 corosync or sysrg trigger). Would you mind to share corosync.conf (together with debug logs)?

Comment 8 Jan Friesse 2016-08-25 14:22:15 UTC

As discussed with Martin, I was able to reproduce the issue. It is really needed to crash node rather than just stop corosync and/or qdevice. Basically what happens:
- Node 1 dies but disconnect cannot be send
- Node 2 finds out node 1 is dead and starts forming new membership sending membership change to qnetd
- Qnetd LMS algo sees Node 1 as alive and Node 2 as split but not leader -> sends NACK to Node 2
- Eventually Qnetd finds out Node 2 died

Solution used in ffsplit is that qnetd_algo_lms_client_disconnect is handled and current status is revalued. This is probably not a good choice for lms, because lms keeps vote (if has one) till change (to overcome problem with accidental disconnect from qnetd).

Comment 9 Jan Friesse 2016-08-25 14:58:31 UTC

Created attachment 1194051 [details]
Proposed patch

Solves situation when in 2 node cluster tie-breaker node dies. Because
code contains two bugs, other node got NACK instead of ACK.

- Algo timer is not stack, so calling abort and schedule in timer
callback without setting reschedule is noop.
- It's needed to check not only what current node thinks about
membership, but also what other nodes thinks. If views diverge -> wait.

Comment 10 Jan Friesse 2016-08-26 07:14:42 UTC

Just a note, I'm still unable to reproduce first bug reported by Roman. Roman, can you please paste a logs as Martin did?

Comment 12 Jan Friesse 2016-08-30 07:12:05 UTC

Martine,
thanks for logs. For the next time, please make sure to set

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

(please note subsys: QDEVICE not subsys: VOTEQ). Anyway, I kind of believe that proposed patch solves also this problem. You mind to test scratch build?

Comment 16 Jan Friesse 2016-08-30 12:32:54 UTC

Sounds great, thanks for testing!

Comment 17 Christine Caulfield 2016-08-30 14:26:24 UTC

Created attachment 1195934 [details]
Patch with slightly better English comments

ACK to the patch, thanks for spotting that. 

I've fixed the English in the comments somewhat but the logic seems fine to me.

Comment 18 Jan Friesse 2016-08-30 15:00:50 UTC

Chrissie, thanks for review. Path is now in upstream as b0c850f308d44ddcdf1a1f881c1e1142ad489385

Comment 19 Jan Friesse 2016-08-31 09:21:58 UTC

Created attachment 1196271 [details]
Man: Fix corosync-qdevice-net-certutil link

Comment 20 Jan Friesse 2016-08-31 09:22:23 UTC

Created attachment 1196272 [details]
man: mention qdevice incompatibilites in votequorum.5

Comment 24 errata-xmlrpc 2016-11-04 06:50:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2463.html

Note You need to log in before you can comment on or make changes to this bug.