Bug 1953627 - Failed to upgrade to 4.6.25 from 4.6.18 due to the machine-config failure
Summary: Failed to upgrade to 4.6.25 from 4.6.18 due to the machine-config failure
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Yu Qi Zhang
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-26 14:32 UTC by Angel Fortunato Acosta Bencomo
Modified: 2021-06-20 12:39 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-29 16:27:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Angel Fortunato Acosta Bencomo 2021-04-26 14:32:14 UTC
Description of problem:

machine-config cluster operator degraded due to controller version mismatch

~~~
$ omg get co machine-config -o yaml
...
- lastTransitionTime: '2021-04-22T22:41:00Z'
    message: 'Unable to apply 4.6.25: timed out waiting for the condition during syncRequiredMachineConfigPools:
      pool master has not progressed to latest configuration: controller version mismatch
      for 98-master-generated-kubelet expected d5dc2b519aed5b3ed6a6ab9e7f70f33740f9f8af
      has 14a2b82d9f4c4d8b423f8f05f6926778ef36870d: all 3 nodes are at latest configuration
      rendered-master-381b6c37f8f8020f2e740ba44a1460a2, retrying'
    reason: RequiredPoolsFailed
    status: 'True'
    type: Degraded
...
extension:
    lastSyncError: 'pool master has not progressed to latest configuration: controller
      version mismatch for 98-master-generated-kubelet expected d5dc2b519aed5b3ed6a6ab9e7f70f33740f9f8af
      has 14a2b82d9f4c4d8b423f8f05f6926778ef36870d: all 3 nodes are at latest configuration
      rendered-master-381b6c37f8f8020f2e740ba44a1460a2, retrying'
    master: all 3 nodes are at latest configuration rendered-master-381b6c37f8f8020f2e740ba44a1460a2
    worker: all 13 nodes are at latest configuration rendered-worker-e08dcb17ae6631b16767bdd8b61c8e93
...
~~~


Version-Release number of selected component (if applicable):
Version: 4.6.25
Version: 4.6.18 


Steps to Reproduce:
1. Upgrade to 4.6.25 from 4.6.18


Actual results:

~~~
$ omg get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2m52s  Unable to apply 4.6.25: the cluster operator machine-config has not yet successfully rolled out
~~~

~~~
$ omg get co
NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
machine-config                            4.6.18   False      True         True      21h
~~~

Expected results:
Upgrade to 4.6.25 successfully.


Additional info:
Attached the "01-master-kubelet_content.json", "98-master-generated-kubelet_content.json" and "machine-config-operator-57c965559d-66sl2.log" files

Comment 5 Yu Qi Zhang 2021-04-26 23:50:27 UTC
Hi,

The linked error

```
'Unable to apply 4.6.25: timed out waiting for the condition during syncRequiredMachineConfigPools:
      pool master has not progressed to latest configuration: controller version mismatch
      for 98-master-generated-kubelet expected d5dc2b519aed5b3ed6a6ab9e7f70f33740f9f8af
      has 14a2b82d9f4c4d8b423f8f05f6926778ef36870d: all 3 nodes are at latest configuration
      rendered-master-381b6c37f8f8020f2e740ba44a1460a2, retrying'
```

is basically saying a previous version of the MCO created a machineconfig based on a kubeletconfig, but the new one did not regenerate it, as seen by your later command:

98-master-generated-kubelet                       14a2b82d9f4c4d8b423f8f05f6926778ef36870d  3.1.0            10d
98-worker-generated-kubelet                       eab9c35dfbeb0d21be6e1db3887acbbb93592d34  3.1.0            10d

that is very odd, both the master and worker kubeletconfig never generated by the new version (d5dc2b519aed5b3ed6a6ab9e7f70f33740f9f8af), like all the other non-rendered configs.

I have a few questions:

1. were those ever modified manually?
2. could you post the kubeletconfigs on the system?
3. could you post the machineconfigcontroller pod logs? (oc get logs -n openshift-machine-config-operator machine-config-controller-xxx)


Note You need to log in before you can comment on or make changes to this bug.