Bug 2042826
| Summary: | [SNO] the replicas of ingresscontroller/default is 2 on new installed SNO private cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> |
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
| Networking sub component: | router | QA Contact: | Shudi Li <shudili> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | aos-bugs, gpei, mmasters, shudili, yunjiang |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-02 01:38:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Hongan Li
2022-01-20 08:25:37 UTC
Setting blocker+ because this breaks the install for SNO+private. The issue probably lies in the generation of the default ingresscontroller manifest that the installer uses when the install-config specifies that a private cluster is desired. Is this a regression from 4.9, or is this also broken on 4.9 (and probably earlier releases too)? The issue can be reproduced in 4.9.0-0.nightly-2022-01-20-172411 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2022-01-20-172411 True False 59m Error while reconciling 4.9.0-0.nightly-2022-01-20-172411: the cluster operator ingress is degraded % 2. % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-62-116.us-east-2.compute.internal Ready master,worker 70m v1.22.3+e790d7f % 3. % oc get co/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.9.0-0.nightly-2022-01-20-172411 True False True 65m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-6b6fbf7f7f-qfkzs" cannot be scheduled: 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. Make sure you have sufficient worker nodes.), DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available) % Thanks! Because this is not a regression, I am clearing the blocker flag. However, I have already posted a fix for review anyway. After discussing with the installer team, we've decided that the appropriate way to resolve the issue is to change the operator's defaulting behavior when spec.replicas is omitted on an IngressController. Blocked on getting a reviewer for <https://github.com/openshift/api/pull/1103>. Moving this BZ off of 4.10.0; we'll get it in a later release. This BZ is somewhat related to this proposed enhancement: <https://github.com/openshift/enhancements/pull/1041>. I'll keep this BZ on the backlog for now. https://github.com/openshift/cluster-ingress-operator/pull/728/commits/d52a837623d29d8b265bf3fa9e395a37be778f78 for https://issues.redhat.com/browse/MGMT-9797 should have fixed the issue. Please verify and let me know if there is still an issue. Verified it with 4.11.0-0.nightly-2022-10-26-170309 on a sno cluster 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-10-26-170309 True False 12m Cluster version is 4.11.0-0.nightly-2022-10-26-170309 % 2. % oc get infrastructures.config.openshift.io cluster -oyaml status: apiServerInternalURI: https://api-int.shudi-411snop12.qe.devcluster.openshift.com:6443 apiServerURL: https://api.shudi-411snop12.qe.devcluster.openshift.com:6443 controlPlaneTopology: SingleReplica etcdDiscoveryDomain: "" infrastructureName: shudi-411snop12-2tc8f infrastructureTopology: SingleReplica <--- platform: AWS platformStatus: aws: region: us-east-2 type: AWS % 3 % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-54-255.us-east-2.compute.internal Ready master,worker 31m v1.24.6+5157800 % 4. check the router-pod, only one pod as expected shudi@Shudis-MacBook-Pro ~ % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-c86b8754f-jkj8m 1/1 Running 3 (22m ago) 29m % 5. % oc get co/ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.11.0-0.nightly-2022-10-26-170309 True False False 21m % The change mentioned in comment 11 shipped in the 4.11.0 GA release, so I am changing the resolution of this BZ to "CURRENTRELEASE". |