Bug 1780137

Summary:

Adding quorum device requires restart to clear WaitForAll flag

Product:

Red Hat Enterprise Linux 8

Reporter:

Josef Zimek <pzimek>

Component:

corosync

Assignee:

Jan Friesse <jfriesse>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

8.4

CC:

ccaulfie, cluster-maint, cluster-qe, jfriesse, mnovacek, ondrej-redhat-developer, phagara

Target Milestone:

Flags:

pm-rhel: mirror+

Target Release:

8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

corosync-3.0.3-4.el8

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

1780134

Environment:

Last Closed:

2020-11-04 03:25:51 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1780134

Attachments:

Description	Flags
votequorum: Reflect runtime change of 2Node to WFA	none
votequorum: Ignore the icmap_get_* return value	none

Comment 2 Jan Friesse 2020-01-21 15:49:15 UTC

Created attachment 1654290 [details]
votequorum: Reflect runtime change of 2Node to WFA

votequorum: Reflect runtime change of 2Node to WFA

When 2Node mode is set, WFA is also set unless WFA is configured
explicitly. This behavior was not reflected on runtime change, so
restarted corosync behavior was different (WFA not set). Also when
cluster is reduced from 3 nodes to 2 nodes during runtime, WFA was not
set, what may result in two quorate partitions.

Solution is to set WFA depending on 2Node when WFA
is not explicitly configured.

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 3 Jan Friesse 2020-01-21 15:54:12 UTC

Created attachment 1654293 [details]
votequorum: Ignore the icmap_get_* return value

votequorum: Ignore the icmap_get_* return value

Express intention to ignore icmap_get_* return
value and rely on default behavior of not changing the output
parameter on error.

Signed-off-by: Jan Friesse <jfriesse>

Comment 4 Jan Friesse 2020-01-23 14:56:11 UTC

For QE: Bug reproducer is described in the comment 1. I've tested with just setting two_node: 1 in corosync.conf.

corosync.conf:
...
quorum {
    provider: corosync_votequorum
    two_node: 1
...

# corosync-quorumtool
...
Flags:            2Node WaitForAll 
...

Changing corosync.conf is it doesn't contain two_node:
...
quorum {
    provider: corosync_votequorum
...

# corosync-cfgtool -R
# corosync-quorumtool
...
Flags:
...

Add two_node back:
...
quorum {
    provider: corosync_votequorum
    two_node: 1
...

# corosync-quorumtool
...
Flags:            2Node WaitForAll 
...

Comment 5 Patrik Hagara 2020-05-14 13:54:23 UTC

qa_ack+, repro in description and comment#4

Comment 8 michal novacek 2020-09-17 09:50:21 UTC


Common part
-----------

Following quorum node adding using rhel8 workflow [1]. Quorum node added to two node cluster [2]

Start with two node cluster:
> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            2Node Quorate WaitForAll 

> [root@virt-245 ~]# pcs quorum device add model net host=virt-020
...

# Two nodes from corosync.conf are gone even after sync.
> [root@virt-245 ~]# pcs cluster sync corosync
virt-245: Succeeded
virt-246: Succeeded
> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf

Before the fix corosync-3.0.3-2.el8.x86_64
------------------------------------------

# WaitForAll flag is still present after quorum node were added
> [root@virt-245 ~]#  pcs quorum status | grep Flags
Flags:            Quorate WaitForAll Qdevice                    <<<<<<<<<<<<

<cluster stop and start>

# WaitForAll flag is gone
> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            Quorate Qdevice

# Removing quorum device will reintroduce 2Node but not WaitForAll
> [root@virt-245 ~]# pcs quorum device remove
...

> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags             
Flags:            2Node Quorate WaitForAll                      <<<<<<<<<<<<



After the fix corosync-3.0.3-4.el8.x86_64
-----------------------------------------

# WaitForAll flag is gone after quorum device is added
> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            Quorate Qdevice

# Removing quorum device will reintroduce 2Node and WaitForAll flags
> [root@virt-245 ~]# pcs quorum device remove
...

> [root@virt-245 ~]# grep two_node /etc/corosync/corosync.conf
    two_node: 1

> [root@virt-245 ~]# pcs quorum status | grep Flags
Flags:            2Node Quorate WaitForAll

-----

>[1]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/configuring_and_managing_high_availability_clusters/index


>[2]: 
[root@virt-245 ~]# pcs quorum status       
Quorum information
------------------
Date:             Thu Sep 17 11:17:30 2020
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1.2c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1  
Flags:            2Node Quorate WaitForAll 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1         NR virt-245 (local)
         2          1         NR virt-246

[root@virt-245 ~]# pcs status
Cluster name: STSRHTS19388
Cluster Summary:
  * Stack: corosync
  * Current DC: virt-246 (version 2.0.3-5.el8_2.1-4b1f869f0f) - partition with quorum
  * Last updated: Thu Sep 17 11:17:37 2020
  * Last change:  Thu Sep 17 09:51:35 2020 by root via cibadmin on virt-245
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ virt-245 virt-246 ]

Full List of Resources:
  * fence-virt-245      (stonith:fence_xvm):    Started virt-245
  * fence-virt-246      (stonith:fence_xvm):    Started virt-246
  * dummy       (ocf::pacemaker:Dummy): Started virt-245
  * fence-virt-020      (stonith:fence_xvm):    Started virt-246

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Comment 11 errata-xmlrpc 2020-11-04 03:25:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (corosync bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4736