Description of problem: During an upgrade, MCO (or component thereof) applies manifests (specifically, static pod manifests) before drain has been completed. The net result is if a bad etcd image is pushed, we can lose etcd quorum. Version-Release number of selected component (if applicable): 4.1+ How reproducible: Edge case, but 100% reproducible if a bad etcd image makes it into the upgrade graph. Steps to Reproduce: 1. Deploy cluster 2. Upgrade cluster to release with bad etcd version Actual results: Cluster loses etcd quorum Expected results: Cluster stops applying new manifests to other nodes; Cluster should always drain a node before MCD applies new manifests, this will allow us to respect etcd-quorum-guard PDB. Additional info: Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16 Suggested Fix: Drain first, then apply static pod manifests. This will result in etcd-quorum-guard protecting us in the future in case we get a bad etcd-image somewhow.
(In reply to Michael Gugino from comment #0) > Description of problem: > During an upgrade, MCO (or component thereof) applies manifests > (specifically, static pod manifests) before drain has been completed. The > net result is if a bad etcd image is pushed, we can lose etcd quorum. > > Version-Release number of selected component (if applicable): > 4.1+ > > How reproducible: > Edge case, but 100% reproducible if a bad etcd image makes it into the > upgrade graph. > > Steps to Reproduce: > 1. Deploy cluster > 2. Upgrade cluster to release with bad etcd version > > Actual results: > Cluster loses etcd quorum > > Expected results: > Cluster stops applying new manifests to other nodes; Cluster should always > drain a node before MCD applies new manifests, this will allow us to respect > etcd-quorum-guard PDB. So MCD isn't really _applying_ manifests, it just drops files around the host fs. For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc). Now, we drop files well before drain indeed, what is expected here? 1. Drain 2. write files to disk 3. reboot How do we make sure that by the time we drop the static manifests we don't incur in a reboot already? I feel I'm missing something here. I would have expected that by the time we drop the manifests and go to drain, the bad etcd-member yaml to be already caught > > Additional info: > > Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1761557#c16 > > Suggested Fix: > > Drain first, then apply static pod manifests. This will result in > etcd-quorum-guard protecting us in the future in case we get a bad > etcd-image somewhow.
(In reply to Antonio Murdaca from comment #1) > So MCD isn't really _applying_ manifests, it just drops files around the > host fs. > For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc). If you place a static pod manifest on the disk, kubelet starts it. If it's an update to an existing static pod, kubelet kills the old pod and starts the new one. > How do we make sure that by the time we drop the static manifests we don't incur in a reboot already? I feel I'm missing something here. I don't know the order or operations of what MCO currently is. But the flow probably should be 1) Drain 2) Everything else happens after drain.
(In reply to Michael Gugino from comment #2) > (In reply to Antonio Murdaca from comment #1) > > So MCD isn't really _applying_ manifests, it just drops files around the > > host fs. > > For etcd specifically it dumps kubernetes yamls under /etc/kubernetes (iirc). > > If you place a static pod manifest on the disk, kubelet starts it. If it's > an update to an existing static pod, kubelet kills the old pod and starts > the new one. > > > How do we make sure that by the time we drop the static manifests we don't incur in a reboot already? > I feel I'm missing something here. > > I don't know the order or operations of what MCO currently is. But the flow > probably should be > 1) Drain > 2) Everything else happens after drain. that's exactly my point... in MCO, if we _first_ do drain then we have: - apply files (and manifests) - reboot now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm
(In reply to Antonio Murdaca from comment #3) > now, who can guarantee that kube starts the etcd member pod and validates it before we call into reboot? I can't seem to see how this is different from before uhm So, if we drain first, then apply manifests, then reboot, here's why that's better: The etcd-quorum-guard will no longer be running on that host. So, the new etcd static pod can start before the reboot or not, doesn't matter. What matters is, the *next* host will not change anything because the next host will block on drain due to the first host not having etcd-quorum-guard ready. Once etcd-quorum-guard becomes ready again, drain will then proceed on the second master. If it does not become ready (in case of etcd static pod is broken for some reason), then drain will be blocked, and the second master will never get the manifests.
This BZ verification would require a version with a bad etcd. There are no messages in logs that would indicate a change in behavior which is the separation of the drain and reboot functionality. The way it was verified by development was running the upgrade 100 times. I have checked the CI e2e 4.2 to 4.3 upgrade from when this BZ was fixed and have not seen any failures related to this, therefore I am considering this verified. We can re-open if the issue reappears.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062