Description of problem: MCO is doing extra node reboot on 4.5 -> 4.6 upgrade. After 4.5 install the nodes have 2 boots (initial RHCOS + pivoted). After upgrade to 4.6.fc5 nodes have 2 more boots. Seems one of the upgrade reboots can be avoided. STR: * Launch 4.5 * Check `journalctl --list-boots` on any node * Upgrade to 4.6 * Check `journalctl --list-boots` on any node I'll link collected must-gather in the followup comments
I installed my own cluster and was able to reproduce this. So I think what is happening is this: Worker pools only have 3 boots. Master pools have 4. There are actually 3 rendered master configs vs 2 of workers, and this is because the OLD machineconfigcontroller renders one last config before it dies. The configs have the following two files diff'ed only: /etc/kubernetes/static-pod-resources/etcd-member/ca.crt /etc/kubernetes/static-pod-resources/etcd-member/metric-ca.crt With the introduction of https://github.com/openshift/machine-config-operator/pull/1933, we remove the following keys: etcdCAData and etcdMetricCAData from the controllerconfig. This causes a new controllerconfigs to be generated when you upgrade to 4.6, that strips out those keys, which the controller then takes to render a new machineconfig. I believe this is because when the new MCO for 4.6 rolls out it rolls out in this order: RenderConfig MachineConfigPools MachineConfigDaemon MachineConfigController MachineConfigServer RequiredPools So before the new MCC takes over the old MCC still has one last chance to render a new config and start the upgrade process. If my above assessment is correct, I don't believe this is a regression since we've always rolled out in this way. Which is somewhat odd that we've never reported this before (or have we?) that the MCO will roll out an extra config IF the controllerconfig has changed. Now I think about it I think I've seen quite a few must-gathers that exhibit similar behaviour. I guess most people don't look at the boots.
I took a quick look at the code and I think the steps are as follows: 1. The new MCO rolls out. It first syncs the controllerconfig to the new one as part of the RenderConfig step above. 2. The running (old) MCC's template controller has an informer on the controllerconfig, which since the previous step updated the config, calls a syncControllerConfig. This re-generates the base master config (00-master MC) 3. The running (old) MCC now sees a MC change which calls another sync call to generate a new rendered config, and instructs the master pool to start an update 4. The new MCC rolls out, and does the whole rendering process, creating the actual updated MC for the master pool, which it waits until the previous one is finished, and starts the new (real) update. I don't recall us modifying this behaviour recently so I think we've always done it this way. I'm surprised this change of controllerconfig being synced first from the old MCC hasn't caused us trouble (as far as I know) in the past. Do you think this is high enough impact to warrant high priority as a BZ? It shouldn't affect anything currently other than incur one extra reboot.
>Do you think this is high enough impact to warrant high priority as a BZ? I think its pretty important. Every additional boot takes a lot of time, especially on baremetal. Also a mix of MCC and operator versions is concerning and may bite us later. Another problem I've hit is OKD-specific, but related to that. In OKD 4.6 we should install NetworkManager-ovs on hosts for network to function correctly. During clean install we've chosen to install it using OS extensions. During 4.5 -> 4.6 OKD upgrade we add an MC which contains necessary extensions. But due to additional reboot nodes don't come up - they already use new network, but don't have necessary packages installed yet (new MCC didn't generate Ignition with 4.6 osURL yet). As a result the whole upgrade stops.
Looking at this now with Jerry what I'm feeling is that in this ordering: RenderConfig ... MachineConfigController We really want both of these to happen before we should start rolling out any changes to the pools. There were a few changes in the MCO around similar "version mismatch" issues: - https://github.com/openshift/machine-config-operator/commit/c2ef2c1bad545eb8d60e3f57024526dfb49829df - https://github.com/openshift/machine-config-operator/commit/c7d3d9a805a0932afbb4141600ff0038c480c59c Does the RenderConfig not have the operator version embedded today? I think the simplest fix along those lines would be to embed it, and have the controller check it and ignore the render config if it doesn't match.
Sorry it's the controllerconfig object that needs a version I think, renderconfig is just what the operator uses.
OK since I had this in mind I just tried to dive in and do a fix: https://github.com/openshift/machine-config-operator/pull/2112
Tentatively setting this to 4.7 as this shouldn't block the existing 4.5->4.6 upgrade. As discussed there are a few approaches: 1. backport a fix to 4.5, that has the 4.5 controller check which operator version generated the controllerconfig. If there is a mismatch, don't render the MC (basically it would wait for the new controller to roll out first) - this approach comes with the issue that we must block all 4.5 -> 4.6 upgrades without the fix 2. have the 4.6 operator pod "pause" the old controller pod (before it generates the new controllerconfig, until the daemonset finishes the rolling update for the pod. Then resume operations - we can potentially expand this so that all operations are paused when the new operator is rolling out pods. This may have other side effects Given the timelines I'm not sure we will be able to fix this for 4.6
Following a conversation with Jerry, we will move this to 4.7 pending a conversation about the proposed change with the MCO team next week.
I think we should fix this for 4.7, marking blocker.
Verified on 4.7.0-0.nightly-2020-12-11-135127. Upgraded from 4.6.8 to 4.7.0-0.nightly-2020-12-11-135127 -- Only one boot is observed. $ for i in $(oc get nodes -o name); do oc debug $i -- chroot /host journalctl --list-boots; done Starting pod/ip-10-0-139-254us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 50766f40d79940d4ba3e00d02994e533 Fri 2020-12-11 21:52:58 UTC—Fri 2020-12-11 22:02:56 UTC 0 c77f9a72e7044ec990173b52a8cdad0c Fri 2020-12-11 22:03:15 UTC—Fri 2020-12-11 22:30:20 UTC Removing debug pod ... Starting pod/ip-10-0-144-25us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 e6803cbd7a1f4bdb946abdaac0e62de0 Fri 2020-12-11 22:04:04 UTC—Fri 2020-12-11 22:10:26 UTC 0 7900e377606b4e2c9a01520b0ddec718 Fri 2020-12-11 22:10:43 UTC—Fri 2020-12-11 22:30:32 UTC Removing debug pod ... Starting pod/ip-10-0-165-187us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 1855da4e7bc247aa865e21ef6ce6f898 Fri 2020-12-11 22:03:05 UTC—Fri 2020-12-11 22:10:22 UTC 0 776ff364a2a447bc8e42abadeb1874ca Fri 2020-12-11 22:10:39 UTC—Fri 2020-12-11 22:30:45 UTC Removing debug pod ... Starting pod/ip-10-0-173-136us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 77fcc9652021428ebd8b12d4c76346c2 Fri 2020-12-11 21:52:46 UTC—Fri 2020-12-11 21:58:35 UTC 0 332db58320bd4257a93de7e19da70ac8 Fri 2020-12-11 21:58:53 UTC—Fri 2020-12-11 22:30:57 UTC Removing debug pod ... Starting pod/ip-10-0-212-236us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 edc410c7d3874c899e28e5bb6f82b709 Fri 2020-12-11 21:52:46 UTC—Fri 2020-12-11 22:02:22 UTC 0 57a062b876234798be5ace680cddd9d5 Fri 2020-12-11 22:02:39 UTC—Fri 2020-12-11 22:31:07 UTC Removing debug pod ... Starting pod/ip-10-0-221-39us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -1 628feefe3c2945b785b448aedc132270 Fri 2020-12-11 22:03:09 UTC—Fri 2020-12-11 22:04:50 UTC 0 d09c0fb7e0294e099cf910e58f244ebf Fri 2020-12-11 22:05:07 UTC—Fri 2020-12-11 22:31:20 UTC Removing debug pod ... $ oc adm upgrade --force --to-image=registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-12-11-135127 Updating to release image registry.svc.ci.openshift.org/ocp/release:4.7.0-0.nightly-2020-12-11-135127 $ watch oc get clusterversion $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-12-11-135127 True False 4m52s Cluster version is 4.7.0-0.nightly-2020-12-11-135127 $ for i in $(oc get nodes -o name); do oc debug $i -- chroot /host journalctl --list-boots; done Starting pod/ip-10-0-139-254us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 50766f40d79940d4ba3e00d02994e533 Fri 2020-12-11 21:52:58 UTC—Fri 2020-12-11 22:02:56 UTC -1 c77f9a72e7044ec990173b52a8cdad0c Fri 2020-12-11 22:03:15 UTC—Fri 2020-12-11 23:41:24 UTC 0 0c41547572a54f53afe9bfe2e215fdf5 Fri 2020-12-11 23:41:42 UTC—Fri 2020-12-11 23:49:59 UTC Removing debug pod ... Starting pod/ip-10-0-144-25us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 e6803cbd7a1f4bdb946abdaac0e62de0 Fri 2020-12-11 22:04:04 UTC—Fri 2020-12-11 22:10:26 UTC -1 7900e377606b4e2c9a01520b0ddec718 Fri 2020-12-11 22:10:43 UTC—Fri 2020-12-11 23:35:09 UTC 0 53b3715f1c9547f68691bf1832609cd9 Fri 2020-12-11 23:35:26 UTC—Fri 2020-12-11 23:50:02 UTC Removing debug pod ... Starting pod/ip-10-0-165-187us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 1855da4e7bc247aa865e21ef6ce6f898 Fri 2020-12-11 22:03:05 UTC—Fri 2020-12-11 22:10:22 UTC -1 776ff364a2a447bc8e42abadeb1874ca Fri 2020-12-11 22:10:39 UTC—Fri 2020-12-11 23:38:50 UTC 0 ba0db9c1fe4e4770b6beb9a766ed654a Fri 2020-12-11 23:39:07 UTC—Fri 2020-12-11 23:50:06 UTC Removing debug pod ... Starting pod/ip-10-0-173-136us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 77fcc9652021428ebd8b12d4c76346c2 Fri 2020-12-11 21:52:46 UTC—Fri 2020-12-11 21:58:35 UTC -1 332db58320bd4257a93de7e19da70ac8 Fri 2020-12-11 21:58:53 UTC—Fri 2020-12-11 23:32:04 UTC 0 989640a9d5064e72a95cbdf9130f93eb Fri 2020-12-11 23:32:22 UTC—Fri 2020-12-11 23:50:09 UTC Removing debug pod ... Starting pod/ip-10-0-212-236us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 edc410c7d3874c899e28e5bb6f82b709 Fri 2020-12-11 21:52:46 UTC—Fri 2020-12-11 22:02:22 UTC -1 57a062b876234798be5ace680cddd9d5 Fri 2020-12-11 22:02:39 UTC—Fri 2020-12-11 23:36:46 UTC 0 3cfc0873b2c24284bc42fd0d30d4d696 Fri 2020-12-11 23:37:04 UTC—Fri 2020-12-11 23:50:12 UTC Removing debug pod ... Starting pod/ip-10-0-221-39us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` -2 628feefe3c2945b785b448aedc132270 Fri 2020-12-11 22:03:09 UTC—Fri 2020-12-11 22:04:50 UTC -1 d09c0fb7e0294e099cf910e58f244ebf Fri 2020-12-11 22:05:07 UTC—Fri 2020-12-11 23:31:31 UTC 0 9ee0a30ec09049d4806254ca1c7f5157 Fri 2020-12-11 23:31:49 UTC—Fri 2020-12-11 23:50:16 UTC Removing debug pod ... $
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633