Bug 2056928
| Summary: | Ingresscontroller LB scope change behaviour differs for different values of aws-load-balancer-internal annotation | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ravi Trivedi <travi> |
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
| Networking sub component: | router | QA Contact: | Arvind iyengar <aiyengar> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | urgent | CC: | aiyengar, aos-bugs, aos-network-edge-staff, cblecker, hongli, mfisher, mifiedle, mmasters, nmalik, wking |
| Version: | 4.10 | Keywords: | ServiceDeliveryBlocker, Upgrades |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The AWS cloud-provider implementation checks the "service.beta.kubernetes.io/aws-load-balancer-internal" service annotation to determine whether a service load-balancer (SLB) should be configured to be internal (as opposed to being public). The cloud-provider implementation recognizes both the value "0.0.0.0/0" and the value "true" as indicating that an SLB should be internal. The ingress operator in OpenShift 4.7 and earlier sets the value "0.0.0.0/0", and the ingress operator in OpenShift 4.8 and later sets the value "true" for services that the operator creates for internal SLBs. A service that was created on an older cluster might have the annotation value "0.0.0.0/0", which could cause comparisons that check for the "true" value to return the wrong result.
Consequence: When a cluster had an internal SLB that had been configured using the old annotation value and the cluster was upgraded to OpenShift 4.10, the ingress operator would report the Progressing=True clusteroperator status condition, preventing the upgrade from completing.
Fix: Logic was added to the ingress operator to normalize the service.beta.kubernetes.io/aws-load-balancer-internal service annotation for operator-managed services by replacing the value "0.0.0.0/0" with the value "true".
Result: The ingress operator no longer prevents upgrades of clusters with the "service.beta.kubernetes.io/aws-load-balancer-internal=0.0.0.0/0" annotation from completing.
|
Story Points: | --- |
| Clone Of: | 2055470 | Environment: | |
| Last Closed: | 2022-03-10 16:44:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2055470 | ||
| Bug Blocks: | 2057518 | ||
Moving to MODIFIED. No nightly 4.10 includes the fix for this. Verified in "4.10.0-0.nightly-2022-02-24-034852" release version. Testing upgrade from 4.9.23 to 4.10.0-0.nightly-2022-02-24-034852, it is observed that the patch works as intended and upgrade gets completed successfully:
--------
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.23 True False 5m57s Cluster version is 4.9.23
oc -n openshift-ingress edit service/router-default
service/router-default edited
oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 <-------
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
creationTimestamp: "2022-02-24T06:26:49Z"
oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852 --allow-explicit-upgrade=true --force
Updating to release image registry.ci.openshift.org/ocp/release:4.10.0-0.nightly-2022-02-24-034852
Post upgrade:
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-02-24-034852 True False 9m52s Cluster version is 4.10.0-0.nightly-2022-02-24-034852
oc get co ingress
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
ingress 4.10.0-0.nightly-2022-02-24-034852 True False False 66m
oc -n openshift-ingress get service/router-default -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-internal: "true" <-----------
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*'
traffic-policy.network.alpha.openshift.io/local-with-fallback: ""
creationTimestamp: "2022-02-24T06:26:49Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
--------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
tested with the cluster that launched by cluster-bot (launch openshift/cluster-ingress-operator#705 aws) and the PR works as expected. after changing scope to internal and manually changing annotation value to "0.0.0.0/0", ingress-operator update the annotation to "true" immediately. $ oc -n openshift-ingress annotate svc/router-default service.beta.kubernetes.io/aws-load-balancer-internal="0.0.0.0/0" --overwrite service/router-default annotated $ oc -n openshift-ingress get svc/router-default -oyaml apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "5" service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "4" service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2" service.beta.kubernetes.io/aws-load-balancer-internal: "true" service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: '*' traffic-policy.network.alpha.openshift.io/local-with-fallback: "" ### logs of ingress-operator 2022-02-23T04:57:26.536Z INFO operator.ingress_controller ingress/load_balancer_service.go:294 normalized annotation {"namespace": "openshift-ingress", "name": "router-default", "annotation": "service.beta.kubernetes.io/aws-load-balancer-internal", "old": "0.0.0.0/0", "new": "true"} $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest True False 20m Cluster version is 4.10.0-0.ci.test-2022-02-23-041216-ci-ln-bqsc5qk-latest