Bug 2078524
| Summary: | Downgrading a cluster from 4.11 to 4.10 is failed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Shudi Li <shudili> |
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
| Networking sub component: | router | QA Contact: | Shudi Li <shudili> |
| Status: | CLOSED DEFERRED | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | aos-bugs, gspence, hongli, mmasters |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-09 01:18:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2041616 | ||
| Bug Blocks: | |||
|
Description
Shudi Li
2022-04-25 13:56:42 UTC
Changed the bug title to "Downgrading a cluster from 4.11 to 4.10 is failed": 1. With the configuration of liveness probe and readiness probe timeout 5s, the downgrade from 4.11 to 4.10 has being in waiting on ingress for more 5 hours 2. With the default configuration of liveness probe and readiness probe timeout 1s, when downgraded from 4.11 to 4.10, it reported "the cluster operator monitoring has not yet successfully rolled out" a, % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-25-171513 True True 63m Unable to apply 4.10.0-0.nightly-2022-04-24-083512: the cluster operator monitoring has not yet successfully rolled out % b, % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-25-171513 True True 169m Unable to apply 4.10.0-0.nightly-2022-04-24-083512: the cluster operator monitoring has not yet successfully rolled out % I was unable to unable to reproduce the original problem. I tried the following: * Downgrade from 4.11.0-0.nightly-2022-04-24-135651 to 4.10.0-0.nightly-2022-04-24-083512. * Downgrade from 4.11.0-0.nightly-2022-04-26-181148 to 4.10.12. * Downgrade from 4.11.0-0.ci-2022-04-26-195435 to 4.10.0-0.nightly-2022-04-24-083512. In all cases, the cluster operator monitoring got stuck, but the ingress operator downgraded fine (and reverted timeoutSeconds, as expected, after being downgraded). If you are able to reproduce the downgrade failure with the ingress operator, can you get the clusteroperator json or yaml, the router deployment json yaml, and the ingress operator logs? Alternatively, a must-gather archive would be helpful. Downgrade from 4.11.0-0.nightly-2022-04-26-181148 to 4.10.0-0.nightly-2022-04-24-083512 hasn't the original "waiting on ingress" issue, but get the issue of "the cluster operator monitoring has not yet successfully rolled out". 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-26-181148 True True 12m Working towards 4.10.0-0.nightly-2022-04-24-083512: 117 of 771 done (15% complete) % 2. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-26-181148 True True 168m Unable to apply 4.10.0-0.nightly-2022-04-24-083512: the cluster operator monitoring has not yet successfully rolled out % Downgrade from 4.11.0-0.nightly-2022-04-24-135651 to 4.10.0-0.nightly-2022-04-24-083512 has the original "waiting on ingress" issue % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-04-24-135651 True True 46m Working towards 4.10.0-0.nightly-2022-04-24-083512: 611 of 771 done (79% complete), waiting on ingress % Please refer to the attached logs.txt for the get the router deployment json yaml, the ingress operator logs and the clusteroperator json or yaml. Thank you very much! Based on comment 11, the downgrade failure is not dependent on the new feature, and it is not new in 4.11; the failure can be reproduced when downgrading 4.10→4.9 just by having an ingresscontroller with a domain outside the cluster's base domain when initiating the downgrade. For that reason, I am marking this BZ as not a blocker. Note that fixing bug 2041616 should prevent the issue on AWS. OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9237 |