Bug 1780137

Summary: Adding quorum device requires restart to clear WaitForAll flag
Product: Red Hat Enterprise Linux 8 Reporter: Josef Zimek <pzimek>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.4CC: ccaulfie, cluster-maint, cluster-qe, jfriesse, mnovacek, ondrej-redhat-developer, phagara
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-3.0.3-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1780134 Environment:
Last Closed: 2020-11-04 03:25:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1780134    
Attachments:
Description Flags
votequorum: Reflect runtime change of 2Node to WFA
none
votequorum: Ignore the icmap_get_* return value none

Comment 2 Jan Friesse 2020-01-21 15:49:15 UTC
Created attachment 1654290 [details]
votequorum: Reflect runtime change of 2Node to WFA

votequorum: Reflect runtime change of 2Node to WFA

When 2Node mode is set, WFA is also set unless WFA is configured
explicitly. This behavior was not reflected on runtime change, so
restarted corosync behavior was different (WFA not set). Also when
cluster is reduced from 3 nodes to 2 nodes during runtime, WFA was not
set, what may result in two quorate partitions.

Solution is to set WFA depending on 2Node when WFA
is not explicitly configured.

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 3 Jan Friesse 2020-01-21 15:54:12 UTC
Created attachment 1654293 [details]
votequorum: Ignore the icmap_get_* return value

votequorum: Ignore the icmap_get_* return value

Express intention to ignore icmap_get_* return
value and rely on default behavior of not changing the output
parameter on error.

Signed-off-by: Jan Friesse <jfriesse>

Comment 4 Jan Friesse 2020-01-23 14:56:11 UTC
For QE: Bug reproducer is described in the comment 1. I've tested with just setting two_node: 1 in corosync.conf.

corosync.conf:
...
quorum {
    provider: corosync_votequorum
    two_node: 1
...

# corosync-quorumtool
...
Flags:            2Node WaitForAll 
...

Changing corosync.conf is it doesn't contain two_node:
...
quorum {
    provider: corosync_votequorum
...

# corosync-cfgtool -R
# corosync-quorumtool
...
Flags:
...

Add two_node back:
...
quorum {
    provider: corosync_votequorum
    two_node: 1
...

# corosync-quorumtool
...
Flags:            2Node WaitForAll 
...

Comment 5 Patrik Hagara 2020-05-14 13:54:23 UTC
qa_ack+, repro in description and comment#4

Comment 8 michal novacek 2020-09-17 09:50:21 UTC

Common part
-----------

Following quorum node adding using rhel8 workflow [1]. Quorum node added to two node cluster [2]

Start with two node cluster:
> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            2Node Quorate WaitForAll 

> [root@virt-245 ~]# pcs quorum device add model net host=virt-020
...

# Two nodes from corosync.conf are gone even after sync.
> [root@virt-245 ~]# pcs cluster sync corosync
virt-245: Succeeded
virt-246: Succeeded
> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf

Before the fix corosync-3.0.3-2.el8.x86_64
------------------------------------------

# WaitForAll flag is still present after quorum node were added
> [root@virt-245 ~]#  pcs quorum status | grep Flags
Flags:            Quorate WaitForAll Qdevice                    <<<<<<<<<<<<

<cluster stop and start>

# WaitForAll flag is gone
> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            Quorate Qdevice

# Removing quorum device will reintroduce 2Node but not WaitForAll
> [root@virt-245 ~]# pcs quorum device remove
...

> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags             
Flags:            2Node Quorate WaitForAll                      <<<<<<<<<<<<



After the fix corosync-3.0.3-4.el8.x86_64
-----------------------------------------

# WaitForAll flag is gone after quorum device is added
> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            Quorate Qdevice

# Removing quorum device will reintroduce 2Node and WaitForAll flags
> [root@virt-245 ~]# pcs quorum device remove
...

> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            2Node Quorate WaitForAll

-----

>[1]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/index


>[2]: 
[root@virt-245 ~]# pcs quorum status       
Quorum information
------------------
Date:             Thu Sep 17 11:17:30 2020
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1.2c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1  
Flags:            2Node Quorate WaitForAll 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1         NR virt-245 (local)
         2          1         NR virt-246

[root@virt-245 ~]# pcs status
Cluster name: STSRHTS19388
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-246 (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
  * Last updated: Thu Sep 17 11:17:37 2020
  * Last change:  Thu Sep 17 09:51:35 2020 by root via cibadmin on virt-245
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-245 virt-246 ]

Full List of Resources:
  * fence-virt-245      (stonith:fence_xvm):    Started virt-245
  * fence-virt-246      (stonith:fence_xvm):    Started virt-246
  * dummy       (ocf::pacemaker:Dummy): Started virt-245
  * fence-virt-020      (stonith:fence_xvm):    Started virt-246

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Comment 11 errata-xmlrpc 2020-11-04 03:25:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (corosync bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4736