1762536 – MCO applies new manifests to host before host is drained

Bug 1762536 - MCO applies new manifests to host before host is drained

Summary: MCO applies new manifests to host before host is drained

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Antonio Murdaca
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1763695 1763696
TreeView+	depends on / blocked

Reported:	2019-10-16 21:25 UTC by Michael Gugino
Modified:	2020-01-23 11:08 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1763695 (view as bug list)
Environment:
Last Closed:	2020-01-23 11:07:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1200	0	'None'	'closed'	'Bug 1762536: pkg/daemon: drain before applying changes'	2019-12-02 18:54:12 UTC
Red Hat Product Errata	RHBA-2020:0062	0	None	None	None	2020-01-23 11:08:19 UTC

Description Michael Gugino 2019-10-16 21:25:45 UTC

Description of problem:
During an upgrade, MCO (or component thereof) applies manifests (specifically, static pod manifests) before drain has been completed.  The net result is if a bad etcd image is pushed, we can lose etcd quorum.

Version-Release number of selected component (if applicable):
4.1+

How reproducible:
Edge case, but 100% reproducible if a bad etcd image makes it into the upgrade graph.

Steps to Reproduce:
1. Deploy cluster
2. Upgrade cluster to release with bad etcd version

Actual results:
Cluster loses etcd quorum

Expected results:
Cluster stops applying new manifests to other nodes; Cluster should always drain a node before MCD applies new manifests, this will allow us to respect etcd-quorum-guard PDB.

Additional info:

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16

Suggested Fix:

Drain first, then apply static pod manifests. This will result in etcd-quorum-guard protecting us in the future in case we get a bad etcd-image somewhow.

Comment 1 Antonio Murdaca 2019-10-17 11:20:41 UTC

(In reply to Michael Gugino from comment #0)
> Description of problem:
> During an upgrade, MCO (or component thereof) applies manifests
> (specifically, static pod manifests) before drain has been completed.  The
> net result is if a bad etcd image is pushed, we can lose etcd quorum.
> 
> Version-Release number of selected component (if applicable):
> 4.1+
> 
> How reproducible:
> Edge case, but 100% reproducible if a bad etcd image makes it into the
> upgrade graph.
> 
> Steps to Reproduce:
> 1. Deploy cluster
> 2. Upgrade cluster to release with bad etcd version
> 
> Actual results:
> Cluster loses etcd quorum
> 
> Expected results:
> Cluster stops applying new manifests to other nodes; Cluster should always
> drain a node before MCD applies new manifests, this will allow us to respect
> etcd-quorum-guard PDB.

So MCD isn't really _applying_ manifests, it just drops files around the host fs.
For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

Now, we drop files well before drain indeed, what is expected here?

1. Drain
2. write files to disk
3. reboot

How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I would have expected that by the time we drop the manifests and go to drain, the bad etcd-member yaml to be already caught

> 
> Additional info:
> 
> Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16
> 
> Suggested Fix:
> 
> Drain first, then apply static pod manifests. This will result in
> etcd-quorum-guard protecting us in the future in case we get a bad
> etcd-image somewhow.

Comment 2 Michael Gugino 2019-10-17 21:50:28 UTC

(In reply to Antonio Murdaca from comment #1)
> So MCD isn't really _applying_ manifests, it just drops files around the
> host fs.
> For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).

If you place a static pod manifest on the disk, kubelet starts it.  If it's an update to an existing static pod, kubelet kills the old pod and starts the new one.

> How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
I feel I'm missing something here.

I don't know the order or operations of what MCO currently is.  But the flow probably should be
1) Drain
2) Everything else happens after drain.

Comment 3 Antonio Murdaca 2019-10-17 22:31:05 UTC

(In reply to Michael Gugino from comment #2)
> (In reply to Antonio Murdaca from comment #1)
> > So MCD isn't really _applying_ manifests, it just drops files around the
> > host fs.
> > For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc).
> 
> If you place a static pod manifest on the disk, kubelet starts it.  If it's
> an update to an existing static pod, kubelet kills the old pod and starts
> the new one.
> 
> > How do we make sure that by the time we drop the static manifests we don't incur in a reboot already?
> I feel I'm missing something here.
> 
> I don't know the order or operations of what MCO currently is.  But the flow
> probably should be
> 1) Drain
> 2) Everything else happens after drain.

that's exactly my point... in MCO, if we _first_ do drain then we have:

- apply files (and manifests)
- reboot

now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

Comment 4 Michael Gugino 2019-10-17 22:49:40 UTC

(In reply to Antonio Murdaca from comment #3)
> now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm

So, if we drain first, then apply manifests, then reboot, here's why that's better:

The etcd-quorum-guard will no longer be running on that host.  So, the new etcd static pod can start before the reboot or not, doesn't matter.  What matters is, the *next* host will not change anything because the next host will block on drain due to the first host not having etcd-quorum-guard ready.  Once etcd-quorum-guard becomes ready again, drain will then proceed on the second master.  If it does not become ready (in case of etcd static pod is broken for some reason), then drain will be blocked, and the second master will never get the manifests.

Comment 6 Michael Nguyen 2019-12-02 18:57:47 UTC

This BZ verification would require a version with a bad etcd.  There are no messages in logs that would indicate a change in behavior which is the separation of the drain and reboot functionality.  The way it was verified by development was running the upgrade 100 times.  I have checked the CI e2e 4.2 to 4.3 upgrade from when this BZ was fixed and have not seen any failures related to this, therefore I am considering this verified.  We can re-open if the issue reappears.

Comment 8 errata-xmlrpc 2020-01-23 11:07:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062

Note You need to log in before you can comment on or make changes to this bug.