Bug 2037276
| Summary: | [IBMCLOUD] vpc-node-label-updater may fail to label nodes appropriately | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chao Yang <chaoyang> |
| Component: | Storage | Assignee: | Jonathan Dobson <jdobson> |
| Storage sub component: | Storage | QA Contact: | Chao Yang <chaoyang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | aos-bugs, cholman, jdobson, jnowicki, jsafrane, pamoedom, qili |
| Version: | 4.10 | Keywords: | TestBlocker |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:37:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chao Yang
2022-01-05 11:07:34 UTC
It seems to me that the node labeller should retry few times before giving up (with exp. backoff?). In addition, Kubernetes should not even start the driver container until all init containers succeed - does the labeller return a proper exit code? *** Bug 2034886 has been marked as a duplicate of this bug. *** sure, will check, It will be good if we can get the cluster for debug, it will be easy for developer looks following things needs to be done 1- vpc-node-label-updater init container should be failed (if any un-expected case) which will stop to run other container from same pod 2- Let kubernetes/Openshift re-try init container again until its success. Thats what we will put the fix > 2- Let kubernetes/Openshift re-try init container again until its success.
Maybe it should exit after few minutes of trying, waiting forever looks scary. But both 1. and 2. look good.
we have done the code changes and created release tag here https://github.com/IBM/vpc-node-label-updater/releases/tag/v4.1.1 Moving the ball to Red Hat to merge the upstream fix. Marking as blocker, this can cause a whole cluster installation to fail. Install cluster for several times and are successfully with 4.10.0-0.nightly-2022-01-25-023600 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |