Bug 1902307
Summary: | [vSphere] cloud labels management via cloud provider makes nodes not ready | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Pietro Bertera <pbertera> |
Component: | Cloud Compute | Assignee: | dmoiseev |
Cloud Compute sub component: | Cloud Controller Manager | QA Contact: | Huali Liu <huliu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, dmoiseev, mfedosin, mimccune, rkant |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Getting nodes zone labels requires to contact vCenter for obtain that labels values. Due to kubelet tries to do so in a very early initialisation steps it can not read vCenter credentials from the secrets.
Consequence:
In case if vCenter credentials stored in secret and regions/zone parameters are presented in cloud.conf, kubelet can not start, due to lack of credentials for vCenter for obtaining zone/region label values.
Fix:
For vSphere platform with set up secret region and zone labels population was moved out of kubelet initialization sequence to the kube-controller-manager part of cloud provider code.
Result:
Region and zone labels works properly and do not cause kubelet hanging with credentials in secret now.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:35:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pietro Bertera
2020-11-27 16:37:43 UTC
There has been some motion on the upstream issue recently, looks like a fix may be in the pipeline. I suggest we wait for the moment to see if anything happens there We believe that this issue should be resolved as part of the out of tree cloud provider migration. We are currently aiming for a technical preview for vSphere in 4.10. Until then, we will try to mitigate the issue as much as possible via the proposed upstream patch, this won't fully resolve the issue however. We need to find someone upstream from the vSphere community to review the upstream PR. Nothing will be happening downstream with this for now. @Denis, when you are back, could you please take a look at the upstream PR, there was some feedback from cheftako that hasn't been address. Perhaps if we can get those comments addressed we can make some progress on this for the next release No new feedback/comments there. upstream PR still waiting for some meaningful reviews. *** Bug 2009037 has been marked as a duplicate of this bug. *** The upstream PR has merged, this will be included in the cloud provider code once a rebase to 1.24 happens. Nothing we can do with this bug until the rebase occurs in a couple of sprints This is now waiting on the rebase to merge We need to set up a RBAC for the fix within KCMO Verified on 4.11.0-0.nightly-2022-06-21-151125 Steps: 1. install an OCP cluster on vSphere liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-21-151125 True False 19m Cluster version is 4.11.0-0.nightly-2022-06-21-151125 liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-vs411-d5vqm-master-0 Ready master 41m v1.24.0+284d62a huliu-vs411-d5vqm-master-1 Ready master 41m v1.24.0+284d62a huliu-vs411-d5vqm-master-2 Ready master 41m v1.24.0+284d62a huliu-vs411-d5vqm-worker-gmwww Ready worker 29m v1.24.0+284d62a huliu-vs411-d5vqm-worker-zfn9p Ready worker 29m v1.24.0+284d62a liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-vs411-d5vqm-master-0 Running 42m huliu-vs411-d5vqm-master-1 Running 42m huliu-vs411-d5vqm-master-2 Running 42m huliu-vs411-d5vqm-worker-gmwww Running 39m huliu-vs411-d5vqm-worker-zfn9p Running 39m 2. edit the configMap cloud-provider-config adding [Labels] section liuhuali@Lius-MacBook-Pro huali-test % oc edit cm cloud-provider-config -n openshift-config configmap/cloud-provider-config edited ... [Labels] region = k8s-region zone = k8s-zone ... 3. wait all nodes restart and get Ready again. liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-vs411-d5vqm-master-0 Ready master 117m v1.24.0+284d62a huliu-vs411-d5vqm-master-1 Ready master 117m v1.24.0+284d62a huliu-vs411-d5vqm-master-2 Ready,SchedulingDisabled master 117m v1.24.0+284d62a huliu-vs411-d5vqm-worker-gmwww Ready worker 105m v1.24.0+284d62a huliu-vs411-d5vqm-worker-zfn9p Ready worker 105m v1.24.0+284d62a liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION huliu-vs411-d5vqm-master-0 Ready master 125m v1.24.0+284d62a huliu-vs411-d5vqm-master-1 Ready master 125m v1.24.0+284d62a huliu-vs411-d5vqm-master-2 Ready master 125m v1.24.0+284d62a huliu-vs411-d5vqm-worker-gmwww Ready worker 113m v1.24.0+284d62a huliu-vs411-d5vqm-worker-zfn9p Ready worker 113m v1.24.0+284d62a liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-vs411-d5vqm-master-0 Running 126m huliu-vs411-d5vqm-master-1 Running 126m huliu-vs411-d5vqm-master-2 Running 126m huliu-vs411-d5vqm-worker-gmwww Running 123m huliu-vs411-d5vqm-worker-zfn9p Running 123m 4. attach tags to VMs on vSphere UI 5. check machines zone and region being attached liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-vs411-d5vqm-master-0 Running tagregion tagzone 4h19m huliu-vs411-d5vqm-master-1 Running tagregion tagzone 4h19m huliu-vs411-d5vqm-master-2 Running tagregion tagzone 4h19m huliu-vs411-d5vqm-worker-gmwww Running tagregion tagzone 4h15m huliu-vs411-d5vqm-worker-zfn9p Running tagregion tagzone 4h15m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |