Bug 2055470
Summary: | Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ravi Trivedi <travi> | |
Component: | Networking | Assignee: | aos-network-edge-staff <aos-network-edge-staff> | |
Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aos-bugs, cblecker, hongli, mfisher, mmasters, nmalik, wking | |
Version: | 4.10 | Keywords: | ServiceDeliveryBlocker, Upgrades | |
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: The AWS cloud-provider implementation checks the "service.beta.kubernetes.io/aws-load-balancer-internal" service annotation to determine whether a service load-balancer (SLB) should be configured to be internal (as opposed to being public). The cloud-provider implementation recognizes both the value "0.0.0.0/0" and the value "true" as indicating that an SLB should be internal. The ingress operator in OpenShift 4.7 and earlier sets the value "0.0.0.0/0", and the ingress operator in OpenShift 4.8 and later sets the value "true" for services that the operator creates for internal SLBs. A service that was created on an older cluster might have the annotation value "0.0.0.0/0", which could cause comparisons that check for the "true" value to return the wrong result.
Consequence: When a cluster had an internal SLB that had been configured using the old annotation value and the cluster was upgraded to OpenShift 4.10, the ingress operator would report the Progressing=True clusteroperator status condition, preventing the upgrade from completing.
Fix: Logic was added to the ingress operator to normalize the service.beta.kubernetes.io/aws-load-balancer-internal service annotation for operator-managed services by replacing the value "0.0.0.0/0" with the value "true".
Result: The ingress operator no longer prevents upgrades of clusters with the "service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0" annotation from completing.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2056928 2057518 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 10:50:22 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2056928 |
Description
Ravi Trivedi
2022-02-17 03:52:33 UTC
Resetting the target-release field, and pri/severity/bocker- fields as Miciah had intended. The PR merge had made into the "4.11.0-0.ci-2022-02-22-163446" image as of writing. Performing an upgrade test 4.10.0-rc.3 -> 4.11.0-0.ci-2022-02-22-163446, it is observed that the fix works as intended, the "service.beta.kubernetes.io/aws-load-balancer-internal" annotation reverts to "true" when the ingress operator gets upgraded during the progress: ------- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.3 True False 27m Cluster version is 4.10.0-rc.3 oc -n openshift-ingress edit service/router-default service/router-default edited oc -n openshift-ingress get service/router-default -o yaml apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5" service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4" service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 <-------- service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' traffic-policy.network.alpha.openshift.io/local-with-fallback: "" creationTimestamp: "2022-02-23T05:35:06Z" finalizers: - service.kubernetes.io/load-balancer-cleanup oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.11.0-0.ci-2022-02-22-163446 --allow-explicit-upgrade=true --force Updating to release image registry.ci.openshift.org/ocp/release:4.11.0-0.ci-2022-02-22-163446 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-rc.3 True True 2m56s Working towards 4.11.0-0.ci-2022-02-22-163446: 95 of 773 done (12% complete) oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.10.0-rc.3 True True False 54m ingresscontroller "default" is progressing: ScopeChanged: The IngressController scope was changed from "External" to "Internal". To effectuate this change, you must delete the service: `oc -n openshift-ingress delete svc/router-default`; the service load-balancer will then be deprovisioned and a new one created. This will most likely cause the new load-balancer to have a different host name and IP address from the old one's. Alternatively, you can revert the change to the IngressController: `oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"External"}}}}'. insights 4.10.0-rc.3 True False False 53m Post upgrade: oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress 4.11.0-0.ci-2022-02-22-163446 True False False 88m oc -n openshift-ingress get service/router-default -o yaml apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5" service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4" service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-internal: "true" <-------- service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' traffic-policy.network.alpha.openshift.io/local-with-fallback: "" creationTimestamp: "2022-02-23T05:35:06Z" -------- Based on the above outcome, marking this BZ as "verified" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |