Change the secret `pull-secret` in namespace `openshift-config` and the file at `/var/lib/kubelet/config.json` in the kubelet filesystem does not change. This means that it is impossible to rotate the cluster-wide pull secrets.
It's possible there's a bug here, but note that it won't roll out *instantly* - rollout will happen via machineconfigpool a node at a time. What does `oc describe machineconfigpool/worker` show? (xref https://github.com/openshift/enhancements/pull/159 )
$ oc --context build01 describe machineconfigpool/worker Name: worker Namespace: Labels: custom-kubelet=enabled machineconfiguration.openshift.io/mco-built-in= Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool Metadata: Creation Timestamp: 2020-01-30T13:44:57Z Generation: 33 Resource Version: 198730816 Self Link: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker UID: 33d9726a-2342-4331-acd5-8f009630cf09 Spec: Configuration: Name: rendered-worker-8d2719b5c152a88631dcf644af02b186 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-33d9726a-2342-4331-acd5-8f009630cf09-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-33d9726a-2342-4331-acd5-8f009630cf09-registries Machine Config Selector: Match Labels: machineconfiguration.openshift.io/role: worker Node Selector: Match Labels: node-role.kubernetes.io/worker: Paused: false Status: Conditions: Last Transition Time: 2020-01-30T13:45:46Z Message: Reason: Status: False Type: RenderDegraded Last Transition Time: 2020-08-19T12:15:20Z Message: Reason: Status: False Type: Updated Last Transition Time: 2020-08-19T12:15:20Z Message: All nodes are updating to rendered-worker-5af6f1b2f309120561a62877926e7649 Reason: Status: True Type: Updating Last Transition Time: 2020-08-19T18:42:48Z Message: Node ip-10-0-130-141.ec2.internal is reporting: "failed to drain node (5 tries): timed out waiting for the condition: error when evicting pod \"search-0\": global timeout reached: 1m30s" Reason: 1 nodes are reporting degraded status on sync Status: True Type: NodeDegraded Last Transition Time: 2020-08-19T18:42:48Z Message: Reason: Status: True Type: Degraded Configuration: Name: rendered-worker-d24656669ca45264a78b8306e015c863 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-33d9726a-2342-4331-acd5-8f009630cf09-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-33d9726a-2342-4331-acd5-8f009630cf09-registries Degraded Machine Count: 1 Machine Count: 17 Observed Generation: 33 Ready Machine Count: 0 Unavailable Machine Count: 1 Updated Machine Count: 0 Events: <none>
It's been about a week since we updated the secret, so we're not expecting instant :)
OK so a definite MCO issue is basically: - We randomly pick one node to update - We keep trying And as appears to be the case here if e.g. we fail to evict a pod on that one random node we will just get stuck. But why is search-0 failing to be evicted?
Steve is going to write up something more but AIUI it's basically "broken node pull secret can cause deadlock": - DPTP wanted to rotate the pull secret on the nodes - Before that change had finished rolling out, the old one was expired - Images now started to fail to pull on nodes with the old config - The MCO couldn't drain pods (like search) from working nodes because other nodes failed to pull the image Deadlock. And when we scale up new nodes the MCO will serve the *old* config until it's fully rolled out, so scaleup won't help. We need to either hack in the new pull secret to the nodes, then [use the force](https://github.com/openshift/machine-config-operator/pull/1086) to tell the MCO to not go degraded later. Or, we need an API to tell the MCO "please serve the pending config to new nodes, old one is broken" - this is what we hit in https://github.com/openshift/machine-config-operator/issues/1619 Then we could scale up new nodes and delete the old ones.
While this bug discussed pull secrets expiring and things like that, there are multiple problems that lead to a deadlock in trying to roll out a fix for that. The MCO fix here is a generic change that helps ensure scaling up new nodes can pull a new config (if it's been rolled out to at least one node): https://github.com/openshift/machine-config-operator/pull/2035 This *probably* would have helped this situation, and fixes many others besides. The way to verify this is (and note unfortunately we don't have upstream CI for this, but the code is very simple): - Start a config change by creating a dummy MachineConfig (or any change you want) - Watch `oc get machineconfigpool/worker` and verify the pool is targeting the new config - Wait for at least one worker node to be updated (updated count = 1) - *Before* the update is complete, scale up a worker machineset by one (at least) You should see the scaled up machine immediately boot into the new config. For example, with `oc debug node/` check the number of boots should be 2, not 3 or more.
Verified on 4.7.0-0.nightly-2020-11-23-074526. I used the update of the pull-secret to initiate the config change. Then as the updatedmachinecount became 1 in the machine config pool, I scaled up the machineset. Once the the new machine joined the cluster, I checked the number of reboots and verified that it got the correct pull secret in /var/lib/kubelet/config.json. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-23-074526 True False 73m Cluster version is 4.7.0-0.nightly-2020-11-23-074526 $ ./oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=pull-secret secret/pull-secret data updated $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-35.us-west-2.compute.internal Ready master 82m v1.19.2+b005cfc ip-10-0-131-225.us-west-2.compute.internal Ready worker 74m v1.19.2+b005cfc ip-10-0-163-93.us-west-2.compute.internal Ready master 82m v1.19.2+b005cfc ip-10-0-165-95.us-west-2.compute.internal Ready,SchedulingDisabled worker 74m v1.19.2+b005cfc ip-10-0-211-47.us-west-2.compute.internal Ready worker 74m v1.19.2+b005cfc ip-10-0-216-64.us-west-2.compute.internal Ready,SchedulingDisabled master 82m v1.19.2+b005cfc $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-4254ce9d0ee58689c24cdf92e7858799 False True False 3 0 0 0 81m worker rendered-worker-10b150e04cf57756f0edfc003b6d9fd4 False True False 3 0 0 0 81m $ watch oc get mcp $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-35.us-west-2.compute.internal Ready master 83m v1.19.2+b005cfc ip-10-0-131-225.us-west-2.compute.internal Ready worker 75m v1.19.2+b005cfc ip-10-0-163-93.us-west-2.compute.internal Ready master 84m v1.19.2+b005cfc ip-10-0-165-95.us-west-2.compute.internal Ready worker 75m v1.19.2+b005cfc ip-10-0-211-47.us-west-2.compute.internal Ready worker 75m v1.19.2+b005cfc ip-10-0-216-64.us-west-2.compute.internal Ready,SchedulingDisabled master 83m v1.19.2+b005cfc $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-4254ce9d0ee58689c24cdf92e7858799 False True False 3 1 1 0 83m worker rendered-worker-10b150e04cf57756f0edfc003b6d9fd4 False True False 3 1 1 0 83m $ oc -n openshift-machine-api get machineset NAME DESIRED CURRENT READY AVAILABLE AGE mnguyen477-rt44t-worker-us-west-2a 1 1 1 1 93m mnguyen477-rt44t-worker-us-west-2b 1 1 1 1 93m mnguyen477-rt44t-worker-us-west-2c 1 1 1 1 93m mnguyen477-rt44t-worker-us-west-2d 0 0 93m $ oc -n openshift-machine-api scale --replicas=2 machineset/mnguyen477-rt44t-worker-us-west-2a machineset.machine.openshift.io/mnguyen477-rt44t-worker-us-west-2a scaled $ oc -n openshift-machine-api get machineset/mnguyen477-rt44t-worker-us-west-2a NAME DESIRED CURRENT READY AVAILABLE AGE mnguyen477-rt44t-worker-us-west-2a 2 1 1 1 93m $ oc -n openshift-machine-api get machines NAME PHASE TYPE REGION ZONE AGE mnguyen477-rt44t-master-0 Running m5.xlarge us-west-2 us-west-2a 94m mnguyen477-rt44t-master-1 Running m5.xlarge us-west-2 us-west-2b 94m mnguyen477-rt44t-master-2 Running m5.xlarge us-west-2 us-west-2c 94m mnguyen477-rt44t-worker-us-west-2a-wl9k5 Running m5.large us-west-2 us-west-2a 81m mnguyen477-rt44t-worker-us-west-2b-tvf98 Running m5.large us-west-2 us-west-2b 81m mnguyen477-rt44t-worker-us-west-2c-6h5s7 Running m5.large us-west-2 us-west-2c 81m $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-35.us-west-2.compute.internal NotReady,SchedulingDisabled master 85m v1.19.2+b005cfc ip-10-0-131-225.us-west-2.compute.internal Ready,SchedulingDisabled worker 77m v1.19.2+b005cfc ip-10-0-163-93.us-west-2.compute.internal Ready master 86m v1.19.2+b005cfc ip-10-0-165-95.us-west-2.compute.internal Ready worker 77m v1.19.2+b005cfc ip-10-0-211-47.us-west-2.compute.internal Ready worker 77m v1.19.2+b005cfc ip-10-0-216-64.us-west-2.compute.internal Ready master 85m v1.19.2+b005cfc $ oc -n openshift-machine-api get machineset/mnguyen477-rt44t-worker-us-west-2a NAME DESIRED CURRENT READY AVAILABLE AGE mnguyen477-rt44t-worker-us-west-2a 2 2 1 1 96m $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-35.us-west-2.compute.internal Ready master 90m v1.19.2+b005cfc ip-10-0-131-176.us-west-2.compute.internal NotReady worker 20s v1.19.2+b005cfc ip-10-0-131-225.us-west-2.compute.internal Ready worker 82m v1.19.2+b005cfc ip-10-0-163-93.us-west-2.compute.internal Ready master 90m v1.19.2+b005cfc ip-10-0-165-95.us-west-2.compute.internal Ready worker 82m v1.19.2+b005cfc ip-10-0-211-47.us-west-2.compute.internal Ready worker 82m v1.19.2+b005cfc ip-10-0-216-64.us-west-2.compute.internal Ready master 90m v1.19.2+b005cfc $ watch oc -n openshift-machine-api get machineset/mnguyen477-rt44t-worker-us-west-2a $ oc -n openshift-machine-api get machineset/mnguyen477-rt44t-worker-us-west-2a NAME DESIRED CURRENT READY AVAILABLE AGE mnguyen477-rt44t-worker-us-west-2a 2 2 2 2 100m $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-35.us-west-2.compute.internal Ready master 91m v1.19.2+b005cfc ip-10-0-131-176.us-west-2.compute.internal Ready worker 113s v1.19.2+b005cfc ip-10-0-131-225.us-west-2.compute.internal Ready worker 83m v1.19.2+b005cfc ip-10-0-163-93.us-west-2.compute.internal Ready master 92m v1.19.2+b005cfc ip-10-0-165-95.us-west-2.compute.internal Ready worker 83m v1.19.2+b005cfc ip-10-0-211-47.us-west-2.compute.internal Ready worker 83m v1.19.2+b005cfc ip-10-0-216-64.us-west-2.compute.internal Ready master 91m v1.19.2+b005cfc $ oc debug node/ip-10-0-131-176.us-west-2.compute.internal Starting pod/ip-10-0-131-176us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# last reboot reboot system boot 4.18.0-193.29.1. Mon Nov 23 22:02 still running reboot system boot 4.18.0-193.28.1. Mon Nov 23 21:59 - 22:01 (00:01) wtmp begins Mon Nov 23 21:59:49 2020 sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ...
After further consideration, given that the previous upgrade detail was not documented, I'm inclined to say this does not require a doc update.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633