Bug 1873288

Summary: Changing Cluster-Wide Pull Secret Does Not Trigger Updates In Kubelet Filesystem
Product: OpenShift Container Platform Reporter: Steve Kuznetsov <skuznets>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: aaleman, erich, jerzhang, mburke, mkrejci, walters, wking
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:16:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1897575    

Description Steve Kuznetsov 2020-08-27 18:51:10 UTC
Change the secret `pull-secret` in namespace `openshift-config` and the file at `/var/lib/kubelet/config.json` in the kubelet filesystem does not change. This means that it is impossible to rotate the cluster-wide pull secrets.

Comment 1 Colin Walters 2020-08-27 19:51:09 UTC
It's possible there's a bug here, but note that it won't roll out *instantly* - rollout will happen via machineconfigpool a node at a time.

What does `oc describe machineconfigpool/worker` show?

(xref https://github.com/openshift/enhancements/pull/159 )

Comment 2 Steve Kuznetsov 2020-08-27 20:19:03 UTC
$ oc --context build01 describe machineconfigpool/worker
Name:         worker
Namespace:    
Labels:       custom-kubelet=enabled
              machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-01-30T13:44:57Z
  Generation:          33
  Resource Version:    198730816
  Self Link:           /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:                 33d9726a-2342-4331-acd5-8f009630cf09
Spec:
  Configuration:
    Name:  rendered-worker-8d2719b5c152a88631dcf644af02b186
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-33d9726a-2342-4331-acd5-8f009630cf09-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-33d9726a-2342-4331-acd5-8f009630cf09-registries
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-01-30T13:45:46Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-08-19T12:15:20Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2020-08-19T12:15:20Z
    Message:               All nodes are updating to rendered-worker-5af6f1b2f309120561a62877926e7649
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2020-08-19T18:42:48Z
    Message:               Node ip-10-0-130-141.ec2.internal is reporting: "failed to drain node (5 tries): timed out waiting for the condition: error when evicting pod \"search-0\": global timeout reached: 1m30s"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-08-19T18:42:48Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
  Configuration:
    Name:  rendered-worker-d24656669ca45264a78b8306e015c863
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-33d9726a-2342-4331-acd5-8f009630cf09-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-33d9726a-2342-4331-acd5-8f009630cf09-registries
  Degraded Machine Count:     1
  Machine Count:              17
  Observed Generation:        33
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:                       <none>

Comment 3 Steve Kuznetsov 2020-08-27 20:19:46 UTC
It's been about a week since we updated the secret, so we're not expecting instant :)

Comment 4 Colin Walters 2020-08-27 20:22:26 UTC
OK so a definite MCO issue is basically:

- We randomly pick one node to update
- We keep trying

And as appears to be the case here if e.g. we fail to evict a pod on that one random node we will just get stuck.

But why is search-0 failing to be evicted?

Comment 5 Colin Walters 2020-08-27 21:17:23 UTC
Steve is going to write up something more but AIUI it's basically "broken node pull secret can cause deadlock":

- DPTP wanted to rotate the pull secret on the nodes
- Before that change had finished rolling out, the old one was expired
- Images now started to fail to pull on nodes with the old config
- The MCO couldn't drain pods (like search) from working nodes because other nodes failed to pull the image

Deadlock.

And when we scale up new nodes the MCO will serve the *old* config until it's fully rolled out, so scaleup won't help.

We need to either hack in the new pull secret to the nodes, then [use the force](https://github.com/openshift/machine-config-operator/pull/1086) to tell the MCO to not go degraded later.

Or, we need an API to tell the MCO "please serve the pending config to new nodes, old one is broken" - this is what we hit in https://github.com/openshift/machine-config-operator/issues/1619
Then we could scale up new nodes and delete the old ones.

Comment 10 Colin Walters 2020-11-23 21:47:08 UTC
While this bug discussed pull secrets expiring and things like that, there are multiple problems that lead to a deadlock in trying to roll out a fix for that.

The MCO fix here is a generic change that helps ensure scaling up new nodes can pull a new config (if it's been rolled out to at least one node):
https://github.com/openshift/machine-config-operator/pull/2035

This *probably* would have helped this situation, and fixes many others besides.

The way to verify this is (and note unfortunately we don't have upstream CI for this, but the code is very simple):

- Start a config change by creating a dummy MachineConfig (or any change you want)
- Watch `oc get machineconfigpool/worker` and verify the pool is targeting the new config
- Wait for at least one worker node to be updated (updated count = 1)
- *Before* the update is complete, scale up a worker machineset by one (at least)

You should see the scaled up machine immediately boot into the new config.  For example, with `oc debug node/` check the number of boots should be 2, not 3 or more.

Comment 11 Michael Nguyen 2020-11-23 22:15:34 UTC
Verified on 4.7.0-0.nightly-2020-11-23-074526.

I used the update of the pull-secret to initiate the config change.  Then as the updatedmachinecount became 1 in the machine config pool, I scaled up the machineset.  Once the the new machine joined the cluster, I checked the number of reboots and verified that it got the correct pull secret in /var/lib/kubelet/config.json.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-23-074526   True        False         73m     Cluster version is 4.7.0-0.nightly-2020-11-23-074526

$ ./oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=pull-secret 
secret/pull-secret data updated
$ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-130-35.us-west-2.compute.internal    Ready                      master   82m   v1.19.2+b005cfc
ip-10-0-131-225.us-west-2.compute.internal   Ready                      worker   74m   v1.19.2+b005cfc
ip-10-0-163-93.us-west-2.compute.internal    Ready                      master   82m   v1.19.2+b005cfc
ip-10-0-165-95.us-west-2.compute.internal    Ready,SchedulingDisabled   worker   74m   v1.19.2+b005cfc
ip-10-0-211-47.us-west-2.compute.internal    Ready                      worker   74m   v1.19.2+b005cfc
ip-10-0-216-64.us-west-2.compute.internal    Ready,SchedulingDisabled   master   82m   v1.19.2+b005cfc
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-4254ce9d0ee58689c24cdf92e7858799   False     True       False      3              0                   0                     0                      81m
worker   rendered-worker-10b150e04cf57756f0edfc003b6d9fd4   False     True       False      3              0                   0                     0                      81m
$ watch oc get mcp
$ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-130-35.us-west-2.compute.internal    Ready                      master   83m   v1.19.2+b005cfc
ip-10-0-131-225.us-west-2.compute.internal   Ready                      worker   75m   v1.19.2+b005cfc
ip-10-0-163-93.us-west-2.compute.internal    Ready                      master   84m   v1.19.2+b005cfc
ip-10-0-165-95.us-west-2.compute.internal    Ready                      worker   75m   v1.19.2+b005cfc
ip-10-0-211-47.us-west-2.compute.internal    Ready                      worker   75m   v1.19.2+b005cfc
ip-10-0-216-64.us-west-2.compute.internal    Ready,SchedulingDisabled   master   83m   v1.19.2+b005cfc


$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-4254ce9d0ee58689c24cdf92e7858799   False     True       False      3              1                   1                     0                      83m
worker   rendered-worker-10b150e04cf57756f0edfc003b6d9fd4   False     True       False      3              1                   1                     0                      83m
$ oc -n openshift-machine-api get machineset
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen477-rt44t-worker-us-west-2a   1         1         1       1           93m
mnguyen477-rt44t-worker-us-west-2b   1         1         1       1           93m
mnguyen477-rt44t-worker-us-west-2c   1         1         1       1           93m
mnguyen477-rt44t-worker-us-west-2d   0         0                             93m
$ oc -n openshift-machine-api scale --replicas=2  machineset/mnguyen477-rt44t-worker-us-west-2a
machineset.machine.openshift.io/mnguyen477-rt44t-worker-us-west-2a scaled
$ oc -n openshift-machine-api get  machineset/mnguyen477-rt44t-worker-us-west-2a
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen477-rt44t-worker-us-west-2a   2         1         1       1           93m
$ oc -n openshift-machine-api get  machines
NAME                                       PHASE     TYPE        REGION      ZONE         AGE
mnguyen477-rt44t-master-0                  Running   m5.xlarge   us-west-2   us-west-2a   94m
mnguyen477-rt44t-master-1                  Running   m5.xlarge   us-west-2   us-west-2b   94m
mnguyen477-rt44t-master-2                  Running   m5.xlarge   us-west-2   us-west-2c   94m
mnguyen477-rt44t-worker-us-west-2a-wl9k5   Running   m5.large    us-west-2   us-west-2a   81m
mnguyen477-rt44t-worker-us-west-2b-tvf98   Running   m5.large    us-west-2   us-west-2b   81m
mnguyen477-rt44t-worker-us-west-2c-6h5s7   Running   m5.large    us-west-2   us-west-2c   81m
$ oc get nodes
NAME                                         STATUS                        ROLES    AGE   VERSION
ip-10-0-130-35.us-west-2.compute.internal    NotReady,SchedulingDisabled   master   85m   v1.19.2+b005cfc
ip-10-0-131-225.us-west-2.compute.internal   Ready,SchedulingDisabled      worker   77m   v1.19.2+b005cfc
ip-10-0-163-93.us-west-2.compute.internal    Ready                         master   86m   v1.19.2+b005cfc
ip-10-0-165-95.us-west-2.compute.internal    Ready                         worker   77m   v1.19.2+b005cfc
ip-10-0-211-47.us-west-2.compute.internal    Ready                         worker   77m   v1.19.2+b005cfc
ip-10-0-216-64.us-west-2.compute.internal    Ready                         master   85m   v1.19.2+b005cfc
$  oc -n openshift-machine-api get  machineset/mnguyen477-rt44t-worker-us-west-2a
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen477-rt44t-worker-us-west-2a   2         2         1       1           96m
$ oc get nodes
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-130-35.us-west-2.compute.internal    Ready      master   90m   v1.19.2+b005cfc
ip-10-0-131-176.us-west-2.compute.internal   NotReady   worker   20s   v1.19.2+b005cfc
ip-10-0-131-225.us-west-2.compute.internal   Ready      worker   82m   v1.19.2+b005cfc
ip-10-0-163-93.us-west-2.compute.internal    Ready      master   90m   v1.19.2+b005cfc
ip-10-0-165-95.us-west-2.compute.internal    Ready      worker   82m   v1.19.2+b005cfc
ip-10-0-211-47.us-west-2.compute.internal    Ready      worker   82m   v1.19.2+b005cfc
ip-10-0-216-64.us-west-2.compute.internal    Ready      master   90m   v1.19.2+b005cfc
$ watch oc -n openshift-machine-api get  machineset/mnguyen477-rt44t-worker-us-west-2a
$  oc -n openshift-machine-api get  machineset/mnguyen477-rt44t-worker-us-west-2a
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen477-rt44t-worker-us-west-2a   2         2         2       2           100m
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-130-35.us-west-2.compute.internal    Ready    master   91m    v1.19.2+b005cfc
ip-10-0-131-176.us-west-2.compute.internal   Ready    worker   113s   v1.19.2+b005cfc
ip-10-0-131-225.us-west-2.compute.internal   Ready    worker   83m    v1.19.2+b005cfc
ip-10-0-163-93.us-west-2.compute.internal    Ready    master   92m    v1.19.2+b005cfc
ip-10-0-165-95.us-west-2.compute.internal    Ready    worker   83m    v1.19.2+b005cfc
ip-10-0-211-47.us-west-2.compute.internal    Ready    worker   83m    v1.19.2+b005cfc
ip-10-0-216-64.us-west-2.compute.internal    Ready    master   91m    v1.19.2+b005cfc
$ oc debug node/ip-10-0-131-176.us-west-2.compute.internal
Starting pod/ip-10-0-131-176us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# last reboot
reboot   system boot  4.18.0-193.29.1. Mon Nov 23 22:02   still running
reboot   system boot  4.18.0-193.28.1. Mon Nov 23 21:59 - 22:01  (00:01)

wtmp begins Mon Nov 23 21:59:49 2020
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 14 Yu Qi Zhang 2021-01-06 16:49:42 UTC
After further consideration, given that the previous upgrade detail was not documented, I'm inclined to say this does not require a doc update.

Comment 16 errata-xmlrpc 2021-02-24 15:16:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633