Bug 1946584
| Summary: | Machine-config controller fails to generate MC, when machine config pool with dashes in name presents under the cluster | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sabina Aledort <saledort> | ||||||||
| Component: | Node | Assignee: | Qi Wang <qiwan> | ||||||||
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> | ||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||
| Severity: | high | ||||||||||
| Priority: | high | CC: | alukiano, aos-bugs, dcain, jerzhang, nagrawal, oarribas, qiwan, schoudha, umohnani, william.caban | ||||||||
| Version: | 4.8 | ||||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 4.8.0 | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 2008588 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2021-07-27 22:57:48 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 2008588 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Sabina Aledort
2021-04-06 12:59:46 UTC
@msivak Assigned to you since it's a compute test failing Created attachment 1769560 [details]
tests-artifacts
Created attachment 1770195 [details]
PAO must gather
There are three worker nodes in the cluster cnfdd5 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-cnf: "" cnfdd6 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-cnf: "" cnfdd7 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-duprofile: "" 2 performance profiles perf-example.yaml with nodeSelector as node-role.kubernetes.io/worker-duprofile: "" performance.yaml with nodeSelector as node-role.kubernetes.io/worker-cnf: "" Which means that cnfdd7 should be targeted and profile should be applied to it. *** Bug 1946591 has been marked as a duplicate of this bug. *** In performance-perf-example.yaml I noticed the following:
status:
conditions:
- lastTransitionTime: "2021-04-07T14:50:58Z"
message: 'could not get kubelet config key: error converting kubelet to int: strconv.Atoi:
parsing "kubelet": invalid syntax'
status: "False"
type: Failure
*** Bug 1946589 has been marked as a duplicate of this bug. *** 1. I checked the node labels, mcp selectors in the performance profile and node selectors in the target MCP and based on all that profile should have been applied to a node -> cnfdd7 2. I tired to recreate the issue in my environment. Installed the same mcp, configured a node with the same labels etc etc, created the performance profile which gets applied without any issue. Ran the cnf-tests in the discovery mode and the only tests that failed was because stalld daemon was running on the host. This is a known issue and is being looked into here: https://bugzilla.redhat.com/show_bug.cgi?id=1949027 It looks like some bug under the machine config daemon, I saw the same issue on the different machine. It where it fails https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/helpers.go#L179, so it worth checking the relevant annotations under the kubelet CR and generated MC. 1. once we create the first pool with dash worker-cnf and PAO creates KubeletConfig for it, all good
2. we create a second pool, PAO creates KubeletConfig for it
a. the generated name for the first MC will be 99-worker-cnf-generate-kubelet, the code thinking that we have some
suffix(because our pool name with a dash) - https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/kubelet_config_controller.go#L572 and creates an additional MC for it with the name 99-worker-cnf-generate-kubelet-kubelet (kubelet is prefix annotations:
machineconfiguration.openshift.io/mc-name-suffix: kubelet
b.once it tries to generate the MC for the new kubelet config it fails under https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/helpers.go#L179
Given the relevant code lives in the kubeletconfigcontroller, moving over to the node team to take a look *** Bug 1946588 has been marked as a duplicate of this bug. *** Hello, will this be addressed in the 4.8 GA? Not having dashes seems like a regression, as this worked in previous releases? example: this used to work: ran-du-eng1-smci00-profile0, confirmed on 4.6 The target release is set to 4.8.0 so it will be in the 4.8 GA. Can you confirm that statement? I tried this in the latest 4.8.0-rc.3 and it for sure did not work. Which RC will the fix be contained in? ran-du-eng1-smci00-profile0 did not work. ran.du.fec3.dell03.profile0 worked. Correction: ran-du-eng1-smci00-profile0 did not work. ran.du.eng1.smci00.profile0 worked. Cluster version is 4.8.0-rc.3. @schoudha Do you know if the fix is going to be in 4.8 GA?
I verified the fix is in 4.8.0-0.ci. Created two kubeletconfig and the suffix is as expected.
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
99-worker-cnf-generated-kubelet 29813c845a4a3ee8e6856713c585aca834e0bf1e 3.2.0 3m21s
99-worker-cnf-generated-kubelet-1 29813c845a4a3ee8e6856713c585aca834e0bf1e 3.2.0 4s
$ oc describe kubeletconfig.machineconfiguration.openshift.io/worker-cnf
Status:
Conditions:
Last Transition Time: 2021-07-09T18:01:55Z
Message: Success
Status: True
Type: Success
Events: <none>
@Qi Wang it will be released in 4.8 GA as the target release is 4.8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |