Bug 1763696

Summary:	MCO applies new manifests to host before host is drained - backport
Product:	OpenShift Container Platform	Reporter:	Antonio Murdaca <amurdaca>
Component:	Machine Config Operator	Assignee:	Antonio Murdaca <amurdaca>
Status:	CLOSED WONTFIX	QA Contact:	Michael Nguyen <mnguyen>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.1.z	CC:	kgarriso, mgugino, mnguyen, mpatel, smilner
Target Milestone:	---
Target Release:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1763695	Environment:
Last Closed:	2020-01-20 18:16:05 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1762536, 1763695
Bug Blocks:

Description Antonio Murdaca 2019-10-21 11:26:30 UTC

+++ This bug was initially created as a clone of Bug #1763695 +++

+++ This bug was initially created as a clone of Bug #1762536 +++

Description of problem:
During an upgrade, MCO (or component thereof) applies manifests (specifically, static pod manifests) before drain has been completed.  The net result is if a bad etcd image is pushed, we can lose etcd quorum.

Version-Release number of selected component (if applicable):
4.1+

How reproducible:
Edge case, but 100% reproducible if a bad etcd image makes it into the upgrade graph.

Steps to Reproduce:
1. Deploy cluster
2. Upgrade cluster to release with bad etcd version

Actual results:
Cluster loses etcd quorum

Expected results:
Cluster stops applying new manifests to other nodes; Cluster should always drain a node before MCD applies new manifests, this will allow us to respect etcd-quorum-guard PDB.

Additional info:

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16

Suggested Fix:

Drain first, then apply static pod manifests. This will result in etcd-quorum-guard protecting us in the future in case we get a bad etcd-image somewhow.

--- Additional comment from Antonio Murdaca on 2019-10-17 11:20:41 UTC ---

(In reply to Michael Gugino from comment #0)
> Description of problem:
> During an upgrade, MCO (or component thereof) applies manifests
> (specifically, static pod manifests) before drain has been completed.  The
> net result is if a bad etcd image is pushed, we can lose etcd quorum.
> 
> Version-Release number of selected component (if applicable):
> 4.1+
> 
> How reproducible:
> Edge case, but 100% reproducible if a bad etcd image makes it into the
> upgrade graph.
> 
> Steps to Reproduce:
> 1. Deploy cluster
> 2. Upgrade cluster to release with bad etcd version
> 
> Actual results:
> Cluster loses etcd quorum
> 
> Expected results:
> Cluster stops applying new manifests to other nodes; Cluster should always
> drain a node before MCD applies new manifests, this will allow us to respect
> etcd-quorum-guard PDB.

So MCD isn't really _applying_ manifests, it just drops files around the host fs.
For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

Now, we drop files well before drain indeed, what is expected here?

1. Drain
2. write files to disk
3. reboot

How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I would have expected that by the time we drop the manifests and go to drain, the bad etcd-member yaml to be already caught

> 
> Additional info:
> 
> Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16
> 
> Suggested Fix:
> 
> Drain first, then apply static pod manifests. This will result in
> etcd-quorum-guard protecting us in the future in case we get a bad
> etcd-image somewhow.

--- Additional comment from Michael Gugino on 2019-10-17 21:50:28 UTC ---

(In reply to Antonio Murdaca from comment #1)
> So MCD isn't really _applying_ manifests, it just drops files around the
> host fs.
> For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

If you place a static pod manifest on the disk, kubelet starts it.  If it's an update to an existing static pod, kubelet kills the old pod and starts the new one.

> How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I don't know the order or operations of what MCO currently is.  But the flow probably should be
1) Drain
2) Everything else happens after drain.

--- Additional comment from Antonio Murdaca on 2019-10-17 22:31:05 UTC ---

(In reply to Michael Gugino from comment #2)
> (In reply to Antonio Murdaca from comment #1)
> > So MCD isn't really _applying_ manifests, it just drops files around the
> > host fs.
> > For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).
> 
> If you place a static pod manifest on the disk, kubelet starts it.  If it's
> an update to an existing static pod, kubelet kills the old pod and starts
> the new one.
> 
> > How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
> I feel I'm missing something here.
> 
> I don't know the order or operations of what MCO currently is.  But the flow
> probably should be
> 1) Drain
> 2) Everything else happens after drain.

that's exactly my point... in MCO, if we _first_ do drain then we have:

- apply files (and manifests)
- reboot

now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

--- Additional comment from Michael Gugino on 2019-10-17 22:49:40 UTC ---

(In reply to Antonio Murdaca from comment #3)
> now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

So, if we drain first, then apply manifests, then reboot, here's why that's better:

The etcd-quorum-guard will no longer be running on that host.  So, the new etcd static pod can start before the reboot or not, doesn't matter.  What matters is, the *next* host will not change anything because the next host will block on drain due to the first host not having etcd-quorum-guard ready.  Once etcd-quorum-guard becomes ready again, drain will then proceed on the second master.  If it does not become ready (in case of etcd static pod is broken for some reason), then drain will be blocked, and the second master will never get the manifests.

Comment 1 Kirsten Garrison 2019-11-07 19:40:33 UTC

waiting for cherrypick from 4.2 PR: https://github.com/openshift/machine-config-operator/pull/1194#issuecomment-544471735

Comment 4 Red Hat Bugzilla 2023-09-14 05:44:46 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days