Bug 1367813

Summary:

Shutting down N-1 nodes at once causes cluster with lms qdevice to lose quorum

Product:

Red Hat Enterprise Linux 7

Reporter:

Roman Bednář <rbednar>

Component:

corosync

Assignee:

Jan Friesse <jfriesse>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

ccaulfie, cluster-maint, jfriesse, jkortus, mjuricek, rbednar

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

corosync-2.4.0-4.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-11-04 06:50:09 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

614122

Attachments:

Description	Flags
Proposed patch	none
Patch with slightly better English comments	none
Man: Fix corosync-qdevice-net-certutil link	none
man: mention qdevice incompatibilites in votequorum.5	none

Description Roman Bednář 2016-08-17 14:28:46 UTC

Description of problem:

See subject.
Also when the nodes are shut down one by one with a delay, this issue does not occur and cluster retains quorum as expected, even with only one node online.

Version-Release number of selected component (if applicable):
corosync-qnetd-2.4.0-3.el7.x86_64
corosynclib-2.4.0-3.el7.x86_64
corosync-qdevice-2.4.0-3.el7.x86_64
corosync-2.4.0-3.el7.x86_64
pcs-0.9.152-6.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1) have a 3 node cluster with qdevice setup on separate node, set to lms algorithm

2) kill 2 nodes at once, quorum is 3 at this point and it seem that we loose a vote from qdevice causing cluster to lose quorum

3) cluster and qdevice status:

# pcs qdevice status net
QNetd address:			*:5403
TLS:				Supported (client certificate required)
Connected clients:		1
Connected clusters:		1
Cluster "STSRHTS10485":
    Algorithm:		LMS
    Tie-breaker:	Node with lowest node ID
    Node ID 1:
        Client address:		::ffff:192.168.0.137:47712
        Configured node list:	1, 2, 3
        Membership node list:	1
        Vote:			NACK (NACK

# pcs quorum status
Quorum information
------------------
Date:             Wed Aug 17 16:09:55 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          1/192
Quorate:          No

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      1
Quorum:           3 Activity blocked
Flags:            Qdevice 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1   A,NV,NMW virt-136 (local)
         0          0            Qdevice (votes 2)


# pcs status
Cluster name: STSRHTS10485
Stack: corosync
Current DC: virt-136 (version 1.1.15-9.el7-e174ec8) - partition WITHOUT quorum
Last updated: Wed Aug 17 15:49:02 2016		Last change: Tue Aug 16 17:41:30 2016 by root via crm_node on virt-136

3 nodes and 12 resources configured

Node virt-139: UNCLEAN (offline)
Node virt-140: UNCLEAN (offline)
Online: [ virt-136 ]

Full list of resources:

 fence-virt-136	(stonith:fence_xvm):	Started virt-136
 fence-virt-139	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 fence-virt-140	(stonith:fence_xvm):	Started virt-140 (UNCLEAN)
 fence-virt-141	(stonith:fence_xvm):	Started virt-139 (UNCLEAN)
 Clone Set: dlm-clone [dlm]
     dlm	(ocf::pacemaker:controld):	Started virt-139 (UNCLEAN)
     dlm	(ocf::pacemaker:controld):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 Clone Set: clvmd-clone [clvmd]
     clvmd	(ocf::heartbeat:clvm):	Started virt-139 (UNCLEAN)
     clvmd	(ocf::heartbeat:clvm):	Started virt-140 (UNCLEAN)
     Started: [ virt-136 ]
 IP	(ocf::heartbeat:IPaddr2):	Started virt-136
 Webserver	(ocf::heartbeat:apache):	Started virt-136

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


# pcs quorum device status
Qdevice information
-------------------
Model:			Net
Node ID:		1
Configured node list:
    0	Node ID = 1
    1	Node ID = 2
    2	Node ID = 3
Membership node list:	1

Qdevice-net information
----------------------
Cluster name:		STSRHTS10485
QNetd host:		192.168.0.136:5403
Algorithm:		LMS
Tie-breaker:		Node with lowest node ID
State:			Connected
======================================================

Actual results:
quorum lost

Expected results:
quorum should be retained since we have qdevice connection from/to remaining node. This is basically the purpose of qdevice with such setup and it's an advantage from 'standard' LMS cluster.

Comment 2 Jan Friesse 2016-08-17 15:38:36 UTC

Reassigning to Chrissie because LMS is her field.

Comment 4 Jan Friesse 2016-08-22 08:32:21 UTC

@Martin: Is the same problem happening also with ffsplit?

Comment 5 Jan Friesse 2016-08-22 08:40:08 UTC

QE guys, debug logs would be helpful. Qnetd logs to syslog but it has to be configured so open /etc/sysconfig/corosync-qnetd and change line:

COROSYNC_QNETD_OPTIONS=""

to

COROSYNC_QNETD_OPTIONS="-dd"

Qdevice logs are depending on corosync.conf configuration but generally, syslog is enabled so please use:

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

configuration in /etc/corosync/corosync.conf before reporting problems.

Comment 6 Jan Friesse 2016-08-22 08:54:19 UTC

Also comment to both "reports". I'm unable to reproduce any one of them (trying killall -9 corosync or sysrg trigger). Would you mind to share corosync.conf (together with debug logs)?

Comment 8 Jan Friesse 2016-08-25 14:22:15 UTC

As discussed with Martin, I was able to reproduce the issue. It is really needed to crash node rather than just stop corosync and/or qdevice. Basically what happens:
- Node 1 dies but disconnect cannot be send
- Node 2 finds out node 1 is dead and starts forming new membership sending membership change to qnetd
- Qnetd LMS algo sees Node 1 as alive and Node 2 as split but not leader -> sends NACK to Node 2
- Eventually Qnetd finds out Node 2 died

Solution used in ffsplit is that qnetd_algo_lms_client_disconnect is handled and current status is revalued. This is probably not a good choice for lms, because lms keeps vote (if has one) till change (to overcome problem with accidental disconnect from qnetd).

Comment 9 Jan Friesse 2016-08-25 14:58:31 UTC

Created attachment 1194051 [details]
Proposed patch

Solves situation when in 2 node cluster tie-breaker node dies. Because
code contains two bugs, other node got NACK instead of ACK.

- Algo timer is not stack, so calling abort and schedule in timer
callback without setting reschedule is noop.
- It's needed to check not only what current node thinks about
membership, but also what other nodes thinks. If views diverge -> wait.

Comment 10 Jan Friesse 2016-08-26 07:14:42 UTC

Just a note, I'm still unable to reproduce first bug reported by Roman. Roman, can you please paste a logs as Martin did?

Comment 12 Jan Friesse 2016-08-30 07:12:05 UTC

Martine,
thanks for logs. For the next time, please make sure to set

logging {
...
        logger_subsys {
                subsys: QDEVICE
                debug: on
        }
...
}

(please note subsys: QDEVICE not subsys: VOTEQ). Anyway, I kind of believe that proposed patch solves also this problem. You mind to test scratch build?

Comment 16 Jan Friesse 2016-08-30 12:32:54 UTC

Sounds great, thanks for testing!

Comment 17 Christine Caulfield 2016-08-30 14:26:24 UTC

Created attachment 1195934 [details]
Patch with slightly better English comments

ACK to the patch, thanks for spotting that. 

I've fixed the English in the comments somewhat but the logic seems fine to me.

Comment 18 Jan Friesse 2016-08-30 15:00:50 UTC

Chrissie, thanks for review. Path is now in upstream as b0c850f308d44ddcdf1a1f881c1e1142ad489385

Comment 19 Jan Friesse 2016-08-31 09:21:58 UTC

Created attachment 1196271 [details]
Man: Fix corosync-qdevice-net-certutil link

Comment 20 Jan Friesse 2016-08-31 09:22:23 UTC

Created attachment 1196272 [details]
man: mention qdevice incompatibilites in votequorum.5

Comment 24 errata-xmlrpc 2016-11-04 06:50:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2463.html