Bug 1946584
Summary: | Machine-config controller fails to generate MC, when machine config pool with dashes in name presents under the cluster | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sabina Aledort <saledort> | ||||||||
Component: | Node | Assignee: | Qi Wang <qiwan> | ||||||||
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | high | CC: | alukiano, aos-bugs, dcain, jerzhang, nagrawal, oarribas, qiwan, schoudha, umohnani, william.caban | ||||||||
Version: | 4.8 | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 4.8.0 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 2008588 (view as bug list) | Environment: | |||||||||
Last Closed: | 2021-07-27 22:57:48 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 2008588 | ||||||||||
Attachments: |
|
Description
Sabina Aledort
2021-04-06 12:59:46 UTC
@msivak Assigned to you since it's a compute test failing Created attachment 1769560 [details]
tests-artifacts
Created attachment 1770195 [details]
PAO must gather
There are three worker nodes in the cluster cnfdd5 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-cnf: "" cnfdd6 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-cnf: "" cnfdd7 - with labels node-role.kubernetes.io/worker: "" and node-role.kubernetes.io/worker-duprofile: "" 2 performance profiles perf-example.yaml with nodeSelector as node-role.kubernetes.io/worker-duprofile: "" performance.yaml with nodeSelector as node-role.kubernetes.io/worker-cnf: "" Which means that cnfdd7 should be targeted and profile should be applied to it. *** Bug 1946591 has been marked as a duplicate of this bug. *** In performance-perf-example.yaml I noticed the following: status: conditions: - lastTransitionTime: "2021-04-07T14:50:58Z" message: 'could not get kubelet config key: error converting kubelet to int: strconv.Atoi: parsing "kubelet": invalid syntax' status: "False" type: Failure *** Bug 1946589 has been marked as a duplicate of this bug. *** 1. I checked the node labels, mcp selectors in the performance profile and node selectors in the target MCP and based on all that profile should have been applied to a node -> cnfdd7 2. I tired to recreate the issue in my environment. Installed the same mcp, configured a node with the same labels etc etc, created the performance profile which gets applied without any issue. Ran the cnf-tests in the discovery mode and the only tests that failed was because stalld daemon was running on the host. This is a known issue and is being looked into here: https://bugzilla.redhat.com/show_bug.cgi?id=1949027 It looks like some bug under the machine config daemon, I saw the same issue on the different machine. It where it fails https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/helpers.go#L179, so it worth checking the relevant annotations under the kubelet CR and generated MC. 1. once we create the first pool with dash worker-cnf and PAO creates KubeletConfig for it, all good 2. we create a second pool, PAO creates KubeletConfig for it a. the generated name for the first MC will be 99-worker-cnf-generate-kubelet, the code thinking that we have some suffix(because our pool name with a dash) - https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/kubelet_config_controller.go#L572 and creates an additional MC for it with the name 99-worker-cnf-generate-kubelet-kubelet (kubelet is prefix annotations: machineconfiguration.openshift.io/mc-name-suffix: kubelet b.once it tries to generate the MC for the new kubelet config it fails under https://github.com/openshift/machine-config-operator/blob/0c69300057bac1ea65d544ab0e22b378690b2488/pkg/controller/kubelet-config/helpers.go#L179 Given the relevant code lives in the kubeletconfigcontroller, moving over to the node team to take a look *** Bug 1946588 has been marked as a duplicate of this bug. *** Hello, will this be addressed in the 4.8 GA? Not having dashes seems like a regression, as this worked in previous releases? example: this used to work: ran-du-eng1-smci00-profile0, confirmed on 4.6 The target release is set to 4.8.0 so it will be in the 4.8 GA. Can you confirm that statement? I tried this in the latest 4.8.0-rc.3 and it for sure did not work. Which RC will the fix be contained in? ran-du-eng1-smci00-profile0 did not work. ran.du.fec3.dell03.profile0 worked. Correction: ran-du-eng1-smci00-profile0 did not work. ran.du.eng1.smci00.profile0 worked. Cluster version is 4.8.0-rc.3. @schoudha Do you know if the fix is going to be in 4.8 GA? I verified the fix is in 4.8.0-0.ci. Created two kubeletconfig and the suffix is as expected. $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 99-worker-cnf-generated-kubelet 29813c845a4a3ee8e6856713c585aca834e0bf1e 3.2.0 3m21s 99-worker-cnf-generated-kubelet-1 29813c845a4a3ee8e6856713c585aca834e0bf1e 3.2.0 4s $ oc describe kubeletconfig.machineconfiguration.openshift.io/worker-cnf Status: Conditions: Last Transition Time: 2021-07-09T18:01:55Z Message: Success Status: True Type: Success Events: <none> @Qi Wang it will be released in 4.8 GA as the target release is 4.8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |