Bug 2056928
Summary: | Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ravi Trivedi <travi> |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | urgent | CC: | aiyengar, aos-bugs, aos-network-edge-staff, cblecker, hongli, mfisher, mifiedle, mmasters, nmalik, wking |
Version: | 4.10 | Keywords: | ServiceDeliveryBlocker, Upgrades |
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The AWS cloud-provider implementation checks the "service.beta.kubernetes.io/aws-load-balancer-internal" service annotation to determine whether a service load-balancer (SLB) should be configured to be internal (as opposed to being public). The cloud-provider implementation recognizes both the value "0.0.0.0/0" and the value "true" as indicating that an SLB should be internal. The ingress operator in OpenShift 4.7 and earlier sets the value "0.0.0.0/0", and the ingress operator in OpenShift 4.8 and later sets the value "true" for services that the operator creates for internal SLBs. A service that was created on an older cluster might have the annotation value "0.0.0.0/0", which could cause comparisons that check for the "true" value to return the wrong result.
Consequence: When a cluster had an internal SLB that had been configured using the old annotation value and the cluster was upgraded to OpenShift 4.10, the ingress operator would report the Progressing=True clusteroperator status condition, preventing the upgrade from completing.
Fix: Logic was added to the ingress operator to normalize the service.beta.kubernetes.io/aws-load-balancer-internal service annotation for operator-managed services by replacing the value "0.0.0.0/0" with the value "true".
Result: The ingress operator no longer prevents upgrades of clusters with the "service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0" annotation from completing.
|
Story Points: | --- |
Clone Of: | 2055470 | Environment: | |
Last Closed: | 2022-03-10 16:44:19 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2055470 | ||
Bug Blocks: | 2057518 |
Comment 2
Hongan Li
2022-02-23 05:07:13 UTC
Moving to MODIFIED. No nightly 4.10 includes the fix for this. Verified in "4.10.0-0.nightly-2022-02-24-034852" release version. Testing upgrade from 4.9.23 to 4.10.0-0.nightly-2022-02-24-034852, it is observed that the patch works as intended and upgrade gets completed successfully: -------- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.23 True False 5m57s Cluster version is 4.9.23 oc -n openshift-ingress edit service/router-default service/router-default edited oc -n openshift-ingress get service/router-default -o yaml apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5" service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4" service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 <------- service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' traffic-policy.network.alpha.openshift.io/local-with-fallback: "" creationTimestamp: "2022-02-24T06:26:49Z" oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852 --allow-explicit-upgrade=true --force Updating to release image registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852 Post upgrade: oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-02-24-034852 True False 9m52s Cluster version is 4.10.0-0.nightly-2022-02-24-034852 oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.10.0-0.nightly-2022-02-24-034852 True False False 66m oc -n openshift-ingress get service/router-default -o yaml apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5" service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4" service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-internal: "true" <----------- service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' traffic-policy.network.alpha.openshift.io/local-with-fallback: "" creationTimestamp: "2022-02-24T06:26:49Z" finalizers: - service.kubernetes.io/load-balancer-cleanup -------- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |