1763695 – MCO applies new manifests to host before host is drained

Bug 1763695 - MCO applies new manifests to host before host is drained

Summary: MCO applies new manifests to host before host is drained

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.2.z
Assignee:	Antonio Murdaca
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	1762536
Blocks:	1763696
TreeView+	depends on / blocked

Reported:	2019-10-21 11:24 UTC by Antonio Murdaca
Modified:	2020-01-22 10:47 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1762536
Clones:	1763696 (view as bug list)
Environment:
Last Closed:	2020-01-22 10:46:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1194	0	None	None	None	2019-11-07 17:58:20 UTC
Red Hat Product Errata	RHBA-2020:0107	0	None	None	None	2020-01-22 10:47:03 UTC

Description Antonio Murdaca 2019-10-21 11:24:14 UTC

+++ This bug was initially created as a clone of Bug #1762536 +++

Description of problem:
During an upgrade, MCO (or component thereof) applies manifests (specifically, static pod manifests) before drain has been completed.  The net result is if a bad etcd image is pushed, we can lose etcd quorum.

Version-Release number of selected component (if applicable):
4.1+

How reproducible:
Edge case, but 100% reproducible if a bad etcd image makes it into the upgrade graph.

Steps to Reproduce:
1. Deploy cluster
2. Upgrade cluster to release with bad etcd version

Actual results:
Cluster loses etcd quorum

Expected results:
Cluster stops applying new manifests to other nodes; Cluster should always drain a node before MCD applies new manifests, this will allow us to respect etcd-quorum-guard PDB.

Additional info:

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16

Suggested Fix:

Drain first, then apply static pod manifests. This will result in etcd-quorum-guard protecting us in the future in case we get a bad etcd-image somewhow.

--- Additional comment from Antonio Murdaca on 2019-10-17 11:20:41 UTC ---

(In reply to Michael Gugino from comment #0)
> Description of problem:
> During an upgrade, MCO (or component thereof) applies manifests
> (specifically, static pod manifests) before drain has been completed.  The
> net result is if a bad etcd image is pushed, we can lose etcd quorum.
> 
> Version-Release number of selected component (if applicable):
> 4.1+
> 
> How reproducible:
> Edge case, but 100% reproducible if a bad etcd image makes it into the
> upgrade graph.
> 
> Steps to Reproduce:
> 1. Deploy cluster
> 2. Upgrade cluster to release with bad etcd version
> 
> Actual results:
> Cluster loses etcd quorum
> 
> Expected results:
> Cluster stops applying new manifests to other nodes; Cluster should always
> drain a node before MCD applies new manifests, this will allow us to respect
> etcd-quorum-guard PDB.

So MCD isn't really _applying_ manifests, it just drops files around the host fs.
For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

Now, we drop files well before drain indeed, what is expected here?

1. Drain
2. write files to disk
3. reboot

How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I would have expected that by the time we drop the manifests and go to drain, the bad etcd-member yaml to be already caught

> 
> Additional info:
> 
> Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16
> 
> Suggested Fix:
> 
> Drain first, then apply static pod manifests. This will result in
> etcd-quorum-guard protecting us in the future in case we get a bad
> etcd-image somewhow.

--- Additional comment from Michael Gugino on 2019-10-17 21:50:28 UTC ---

(In reply to Antonio Murdaca from comment #1)
> So MCD isn't really _applying_ manifests, it just drops files around the
> host fs.
> For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

If you place a static pod manifest on the disk, kubelet starts it.  If it's an update to an existing static pod, kubelet kills the old pod and starts the new one.

> How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I don't know the order or operations of what MCO currently is.  But the flow probably should be
1) Drain
2) Everything else happens after drain.

--- Additional comment from Antonio Murdaca on 2019-10-17 22:31:05 UTC ---

(In reply to Michael Gugino from comment #2)
> (In reply to Antonio Murdaca from comment #1)
> > So MCD isn't really _applying_ manifests, it just drops files around the
> > host fs.
> > For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).
> 
> If you place a static pod manifest on the disk, kubelet starts it.  If it's
> an update to an existing static pod, kubelet kills the old pod and starts
> the new one.
> 
> > How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
> I feel I'm missing something here.
> 
> I don't know the order or operations of what MCO currently is.  But the flow
> probably should be
> 1) Drain
> 2) Everything else happens after drain.

that's exactly my point... in MCO, if we _first_ do drain then we have:

- apply files (and manifests)
- reboot

now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

--- Additional comment from Michael Gugino on 2019-10-17 22:49:40 UTC ---

(In reply to Antonio Murdaca from comment #3)
> now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

So, if we drain first, then apply manifests, then reboot, here's why that's better:

The etcd-quorum-guard will no longer be running on that host.  So, the new etcd static pod can start before the reboot or not, doesn't matter.  What matters is, the *next* host will not change anything because the next host will block on drain due to the first host not having etcd-quorum-guard ready.  Once etcd-quorum-guard becomes ready again, drain will then proceed on the second master.  If it does not become ready (in case of etcd static pod is broken for some reason), then drain will be blocked, and the second master will never get the manifests.

Comment 2 Michael Nguyen 2020-01-15 17:31:38 UTC

This BZ verification would require a version with a bad etcd.  There are no messages in logs that would indicate a change in behavior which is the separation of the drain and reboot functionality.  The way it was verified by development was running the upgrade 100 times.  I have checked the CI e2e 4.2 upgrades and also tested upgrading 4.2.0-0.nightly-2020-01-13-060909  to 4.2.0-0.nightly-2020-01-14-110551 10 times and it succeeded each time.  I am considering this verified and we can re-open if necessary.

Comment 4 errata-xmlrpc 2020-01-22 10:46:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0107

Note You need to log in before you can comment on or make changes to this bug.