Bug 1826914 - [OCP4.4][NMO] [API] Putting more than one master node into maintenance should be prevented or not allowed.
Summary: [OCP4.4][NMO] [API] Putting more than one master node into maintenance should...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Maintenance Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.6.0
Assignee: Marc Sluiter
QA Contact: mlammon
URL: https://github.com/kubevirt/node-main...
Whiteboard:
Depends On:
Blocks: 1826908
TreeView+ depends on / blocked
 
Reported: 2020-04-22 18:30 UTC by mlammon
Modified: 2021-02-06 07:08 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Validate maintenance requests for master nodes. Reason: Prevent master (etcd) quorum violation. Result: Master nodes can only be set into maintenance if the etcd-quorum-guard PDB allows it.
Clone Of: 1826908
Environment:
Last Closed: 2020-10-27 15:58:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:58:46 UTC

Description mlammon 2020-04-22 18:30:40 UTC
+++ This bug was initially created as a clone of Bug #1826908 +++
This is for the API as well.

Description of problem:
[NMO] More then one master node should be prevented or not allowed.  The current workflows
allow the attempt to place more then one master node into maintenance.  It will not go
into maintenance because trying to prevent etcd quorum loss.

It probably needs to be prevented from UI (this bug) and/ or both for API. A duplicate
will be generated for API.

Tested version steps:
Version:

[root@sealusa6 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-21-210658   True        False         98m     Cluster version is 4.4.0-0.nightly-2020-04-21-210658  

Steps to reproduce:
1. Deploy cluster with CNV (which has node-maintenace-operator)
2. select compute -> Nodes
3. select 3 dots on any master-0-0 node and "Start Maintenance"
 
< wait for successfully putting master-0-0 node Under maintenance >

4. select 3 dots on any master-0-1 node and "Start Maintenance"

< It will not go into maintenance > because of the etcd-quorum-guard >
The undestanding is

Actual results:
It attempts to put the master-0-1 node into maintenance and eventually throw a warning
"Workloads failing to move
drain did not complete after 1m0s" < see attached image>

This is because etcd-quorum-guard-xxxxxx is preventing. 


Expected results:
It should prevent the user from trying the second and/or third master node from UI/API



Additional info:

Looking at doc even though moved to machine-config-operator [1]

Taken from [2]
"The etcd Quorum Guard ensures that quorum is maintained for etcd for OpenShift.

For the etcd cluster to remain usable, we must maintain quorum, which is a majority of all etcd members. For example, an etcd cluster with 3 members (i.e. a 3 master deployment) must have at least 2 healthy etcd member to meet the quorum limit.

There are situations where 2 etcd members could be down at once:

a master has gone offline and the MachineConfig Controller (MCC) tries to rollout a new MachineConfig (MC) by rebooting masters
the MCC is doing a MachineConfig rollout and doesn't wait for the etcd on the previous master to become healthy again before rebooting the next master
In short, we need a way to ensure that a drain on a master is not allowed to proceed if the reboot of the master would cause etcd quorum loss."


[1] https://github.com/openshift/machine-config-operator/blob/master/install/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml
[2] https://github.com/openshift/etcd-quorum-guard

--- Additional comment from  on 2020-04-22 18:26:17 UTC ---

Comment 2 Stephen Cuppett 2020-04-23 20:05:26 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 3 Andrew Beekhof 2020-05-06 12:47:45 UTC
We should fix this at the operator level so that the same logic applies to CLI driven changes

Comment 4 Michael Moser 2020-05-07 04:49:32 UTC
I think that the following check should be added to the reconcile loop, prior to proceeding with taking a node to maintenance mode:

  if node_is_a_master_node(nodeName) {
	   if (number_of_active_master_nodes() - 1 < quorum_size() {
			report error, don't allow reconcile to proceed.
	   } 
   }


bool node_is_a_master_node(nodeName)
	node object for nodeName has label node-role.kubernetes.io/master

int quorum_size()
	return number_of_master_nodes() / 2 + 1 

int number_of_master_nodes()
	return number of nodes with label node-role.kubernetes.io/master 

int number_of_active_master_nodes()
	return number of nodes with label with label node-role.kubernetes.io/master and node.Spec.Unschedulable == False

Comment 5 Michael Moser 2020-05-07 10:26:22 UTC
PR for this issue was added https://github.com/kubevirt/node-maintenance-operator/pull/76 and is under review.

Comment 6 Michael Moser 2020-05-14 10:20:32 UTC
During review of the PR an issue was identified by Nir: If baremetal machine turns unhealthy then the node object for that node is deleted. This will skew the number of master nodes counted and will alter the calculation of the required quorum size.

We must therefore count the number of currently unhealthy/not running master nodes.

- Andrew suggested to use the etcd client for that task, I was not able to copy the etcd client3 library into the vendored dependencies of node-maintenance-operator.

The currently vendored google protobufs is in conflict with that
required by cluster-etcd-operator.
(in addition it is not clear how to get the required certificates to operate the etcd client3)

So this approach is now blocked by the task of transitioning to mod packaging.

There is an alternative solution to get the number of disabled master nodes from the pod disruption budget, but that's an ongoing discussion that has not been concuded.

Comment 10 mlammon 2020-09-16 12:38:17 UTC
This is resolved.   We can move to verified.

Comment 12 errata-xmlrpc 2020-10-27 15:58:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.