Bug 1826914

Summary:	[OCP4.4][NMO] [API] Putting more than one master node into maintenance should be prevented or not allowed.
Product:	OpenShift Container Platform	Reporter:	mlammon
Component:	Node Maintenance Operator	Assignee:	Marc Sluiter <msluiter>
Status:	CLOSED ERRATA	QA Contact:	mlammon
Severity:	medium	Docs Contact:
Priority:	high
Version:	4.4	CC:	abeekhof, aos-bugs, gharden, jokerman, jtomasek, mfojtik, msluiter, scuppett, spadgett, ukalifon
Target Milestone:	---	Keywords:	Triaged
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
URL:	https://github.com/kubevirt/node-maintenance-operator/pull/76
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: Validate maintenance requests for master nodes. Reason: Prevent master (etcd) quorum violation. Result: Master nodes can only be set into maintenance if the etcd-quorum-guard PDB allows it.	Story Points:	---
Clone Of:	1826908	Environment:
Last Closed:	2020-10-27 15:58:27 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1826908

Description mlammon 2020-04-22 18:30:40 UTC

+++ This bug was initially created as a clone of Bug #1826908 +++
This is for the API as well.

Description of problem:
[NMO] More then one master node should be prevented or not allowed.  The current workflows
allow the attempt to place more then one master node into maintenance.  It will not go
into maintenance because trying to prevent etcd quorum loss.

It probably needs to be prevented from UI (this bug) and/ or both for API. A duplicate
will be generated for API.

Tested version steps:
Version:

[root@sealusa6 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-21-210658   True        False         98m     Cluster version is 4.4.0-0.nightly-2020-04-21-210658  

Steps to reproduce:
1. Deploy cluster with CNV (which has node-maintenace-operator)
2. select compute -> Nodes
3. select 3 dots on any master-0-0 node and "Start Maintenance"
 
< wait for successfully putting master-0-0 node Under maintenance >

4. select 3 dots on any master-0-1 node and "Start Maintenance"

< It will not go into maintenance > because of the etcd-quorum-guard >
The undestanding is

Actual results:
It attempts to put the master-0-1 node into maintenance and eventually throw a warning
"Workloads failing to move
drain did not complete after 1m0s" < see attached image>

This is because etcd-quorum-guard-xxxxxx is preventing. 


Expected results:
It should prevent the user from trying the second and/or third master node from UI/API



Additional info:

Looking at doc even though moved to machine-config-operator [1]

Taken from [2]
"The etcd Quorum Guard ensures that quorum is maintained for etcd for OpenShift.

For the etcd cluster to remain usable, we must maintain quorum, which is a majority of all etcd members. For example, an etcd cluster with 3 members (i.e. a 3 master deployment) must have at least 2 healthy etcd member to meet the quorum limit.

There are situations where 2 etcd members could be down at once:

a master has gone offline and the MachineConfig Controller (MCC) tries to rollout a new MachineConfig (MC) by rebooting masters
the MCC is doing a MachineConfig rollout and doesn't wait for the etcd on the previous master to become healthy again before rebooting the next master
In short, we need a way to ensure that a drain on a master is not allowed to proceed if the reboot of the master would cause etcd quorum loss."


[1] https://github.com/openshift/machine-config-operator/blob/master/install/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml
[2] https://github.com/openshift/etcd-quorum-guard

--- Additional comment from  on 2020-04-22 18:26:17 UTC ---

Comment 2 Stephen Cuppett 2020-04-23 20:05:26 UTC

Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 3 Andrew Beekhof 2020-05-06 12:47:45 UTC

We should fix this at the operator level so that the same logic applies to CLI driven changes

Comment 4 Michael Moser 2020-05-07 04:49:32 UTC

I think that the following check should be added to the reconcile loop, prior to proceeding with taking a node to maintenance mode:

  if node_is_a_master_node(nodeName) {
	   if (number_of_active_master_nodes() - 1 < quorum_size() {
			report error, don't allow reconcile to proceed.
	   } 
   }


bool node_is_a_master_node(nodeName)
	node object for nodeName has label node-role.kubernetes.io/master

int quorum_size()
	return number_of_master_nodes() / 2 + 1 

int number_of_master_nodes()
	return number of nodes with label node-role.kubernetes.io/master 

int number_of_active_master_nodes()
	return number of nodes with label with label node-role.kubernetes.io/master and node.Spec.Unschedulable == False

Comment 5 Michael Moser 2020-05-07 10:26:22 UTC

PR for this issue was added https://github.com/kubevirt/node-maintenance-operator/pull/76 and is under review.

Comment 6 Michael Moser 2020-05-14 10:20:32 UTC

During review of the PR an issue was identified by Nir: If baremetal machine turns unhealthy then the node object for that node is deleted. This will skew the number of master nodes counted and will alter the calculation of the required quorum size.

We must therefore count the number of currently unhealthy/not running master nodes.

- Andrew suggested to use the etcd client for that task, I was not able to copy the etcd client3 library into the vendored dependencies of node-maintenance-operator.

The currently vendored google protobufs is in conflict with that
required by cluster-etcd-operator.
(in addition it is not clear how to get the required certificates to operate the etcd client3)

So this approach is now blocked by the task of transitioning to mod packaging.

There is an alternative solution to get the number of disabled master nodes from the pod disruption budget, but that's an ongoing discussion that has not been concuded.

Comment 7 Marc Sluiter 2020-08-05 07:50:01 UTC

Fixed in https://github.com/kubevirt/node-maintenance-operator/pull/107 and https://github.com/kubevirt/node-maintenance-operator/pull/111

Comment 10 mlammon 2020-09-16 12:38:17 UTC

This is resolved.   We can move to verified.

Comment 12 errata-xmlrpc 2020-10-27 15:58:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196